Dev Essentials

Pre-requisites:
This blog is aimed at an Oracle RDBMS user with a beginner to intermediate level understanding of SQL and PL/SQL languages, and basic Oracle Concepts. If you are an analyst weaned on Excel or other GUI workbenches with some basic programming ability, but know next to nothing about databases, read this friendly intro (ignore the author's controversial opinions), kick the tires on a free edition of Oracle, and return here.

Database:
This blog is based on Oracle10gR2 and higher. Oracle gives away an awful lot of analytics features as part of its free (XE/Express) or low-cost (Standard) editions - but if a highlighted feature is available only in a particular higher-end edition, we will mention it.

Datasets:
We will use the data in the Sample Schema that ships with the Oracle database for most of our examples. Enable SELECT privileges on all the tables in the various sample schemas. When this data is inadequate to explain a concept, we will import data from other public sources.

Loading data:
For small datasets outside the sample schema, we will use INSERT statements. For larger tables, we'll use SQL*Loader and/or External Tables. Here is a quick primer on data load (Note: the extract_files_dir directory object should be created by the DBA using the CREATE DIRECTORY statement). For more details, consult Oracle docs.

DevEnv:
The baseline is Xemacs- SQL*Plus - Oracle10gR2 Enterprise Edition - RHEL on a 2-processor Core duo Dell COTS (common off-the-shelf) machine. Over time, we will try the free SQL*Developer tool for all development. Software downloads of the database and related products are available here.

Output, Visualization of results:
One major weakness of Oracle compared to best-of-breed statistical/scientific products is the charting and visualization capability in its disparate GUI products. Oracle has several of them - SQL*Developer, Application Express, Discoverer, Warehouse Builder, Data Miner, Enterprise Manager, and a whole slew of tools from the Siebel BI (a k a BI EE) and Hyperion acquisitions. These tools address specific areas of data or application management. As the #1 database company, it would be fantastic if Oracle makes a strategic investment to build modular, data visualization solutions across its product stack - integrating the best elements from it existing portfolio of GUI-based products, and third party tools like Spotfire, Tableau, or Dundas (now part of MSFT). But Visualization for BI is still in its nascent stages, so this may not happen soon. For purposes of this blog, we will embed screenshots from Excel, SQL*Dev, App Express, Data Miner, R and other Oracle GUIs as applicable. We will use Dia for diagramming.

Code Quality:
I will test every code fragment I publish here, but I am not the best SQL or PL/SQL developer around - so I welcome comments/corrections on better-expressed and/or more efficient queries for a particular problem. Consult AskTom or search for Celko's articles for SQL tips, and Steve Feuerstein's articles/books for PL/SQL coding tips.

No comments: