This blog is a personal quest to learn/discover/enable the Oracle RDBMS as an analytical/scientific computing engine. While Oracle does not (yet) match the breadth and depth of a popular, best-of-breed platform like SAS for analytics, or a scientific computing suite like Matlab, it offers enough facilities in Oracle SQL and PL/SQL to build a competent business analytics platform, if not yet a comprehensive platform for scientific research.

The answer to the central question of "I am an Oracle database professional ...Why Should I Know Analytics?" is simply "To keep yourself relevant ten years from now". A more lucid, lengthy answer can be found in the books in the Readings section. After a 10-15 year journey through database design, E-R diagrams, Certifications on every database technology, managing terabyte warehouses, you are bound to come to a juncture in your professional life where you ask "We store at least 7-10 years worth of transactional data (as required by Sarbanes-Oxley).. is there anything I can learn about my customers/products/employees implicitly from all this stored data - beyond what I code into my applications?" At this point in time, you begin to become an analyst.

The central question of "Why In-Database Analytics?" deserves a dedicated post rather than a glib answer in this column. But the basic premise is this: The wider and deeper you deploy intelligence across a company, the smarter the company. Any data-driven intelligence that you compute at the source - i.e. where your enterprise data resides - becomes corporate intelligence consumable across the enterprise. The higher you go in your stack to compute this intelligence, the more insular this intelligence becomes to the rest of the company - SOA and other mid-tier technologies notwithstanding. So the proposition is simple - 'If you are an Oracle DBA/Developer/Analyst/CIO, explore the capabilities of the database as an analytical platform before you think about spending on tiered, best-of-breed products. Save that money for the best business analysis and data analysis talent that you can find'.

Over the years, customers/consultants have wanted to incorporate their own algorithms into the database. Oracle has been extensible since 8i, offering a framework to link your C-based Math/Analytics libraries into the database. But unlike Illustra (the competitive Object-relational database of the 8i era), Oracle (wisely) places a user application in its own process sandbox - it does not allow a user to link their "untrusted" C library directly into the Oracle binary - mainly to prevent some buggy/malicious C fragment from crashing a million dollar production system.

The consequence is that your application has to marshal the data between your library and the database through an ExtProc interface. Obviously, this critical data pinch-point has not helped spawn a large collection of extensible third-party applications. However, internal to Oracle, the framework has been a great success, enabling an evolutionary development of products that manage unstructured data - Oracle's Spatial Option, Text, XML, Enterprise Search, PL/SQL Table functions and such features directly or indirectly use this framework or its design concepts.

There is a secondary motivation to this blog. 47% of the world's corporate relational data is stored in Oracle. If even a fraction of this installed base begins to demand more foundational analytics infrastructure such as high performance matrix computation, concurrent processing, math libraries, and such building blocks for scientific computing - this may nudge Oracle towards becoming a world-class analytics engine.

Returning to more humbler pragmatics for the blog, rather than take the classic extensibility route, we'll code techniques using Oracle10gR2 (and later) SQL and PL/SQL. PL/SQL as a language has dramatically matured over the past 5 years, with support for native compilation, IEEE floating point arithmetic, function/rowset caching, and improved scalability and performance. Where expedient/convenient, we will package any open-source Java libraries as Java stored procedures.

If you have read thus far, and have the slightest inkling of the breadth and depth of quantitative techniques out there, you will agree that the mission statement for this blog is downright audacious, or Quixotic, or stupid, or pick your adjective... So to quickly bootstrap the blog, and give myself a minimal chance of success, I have afforded myself these concessions:

  • I will scope my efforts to coding a particular technique, with a best-effort attempt to find an appropriate example of its applicability. I will try to provide pointers to relevant articles/case studies from domain experts (CRM/Retail/Supply Chain etc). Comments and discussions will hopefully add/illuminate the various use cases.
  • I will refer books, other blogs, websites for various topics, and credit the authors - either as links, or by including their books/publications in the References section below
  • Once I stabilize the blog content, I will welcome co-authors - there is a ton of material to cover. So if you are interested, ping me a mail.
Wish me luck, and hope you find this site interesting and useful.

Best regards, Ram

No comments: