Oracle Database Expert: The Secret Sauce

The Secret Sauce

Codd’s secret was relational algebra, a collection of operations that could be used to combine tables. Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like selection, projection, union, difference, and join (more precisely, Cartesian join), listed in Table 1-1.

Why did Codd name this relational algebra? Codd based his theory on rigorous mathematical principles and used the esoteric mathematical term relation to denote what is loosely referred to as a table. I’m now ready to define what I mean by a relational database:

A relational database is a database in which: The data is perceived by the user as tables (and nothing but tables) and the operators available to the user for (for example) retrieval are operators that derive “new” tables from “old” ones.1

Examples of Relational Operations

Let’s use the five operations defined in Table 1-1 to answer this question: “Which employees have worked in all accounting positions—that is, those for which the job_id starts with the characters AC?” The current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. The list of job titles is held in the jobs table.

Given these three tables, you can execute the following steps to answer the business question that has

been posed:

1. This step uses only the employee_id column from the employees table. To do this, you need the projection operation. Following are the SQL command and its results. The employees table contains 107 rows, so this command also produces 107 rows. Only the first five rows are shown here:

select employee_id from employees

100

101

102

103

104

Note that certain formatting aspects of SQL statements, such as lowercase, uppercase, white space, and line feeds, are immaterial except in the case of string literals—that is, strings of characters enclosed within quote marks.

2. This step uses only the job_id column from the jobs table. To obtain this, you need the projection operation again. Following are the SQL command and its results. The jobs table contains 19 rows, so this command also produces 19 rows. Only the first five rows are shown here:

select job_id from jobs

AC_ACCOUNT

AC_MGR

AD_ASST

AD_PRES

AD_VP

3. Remember that the result of any relational operation is always another table. You need a subset of rows from the table created by the projection operation used in step 2. To obtain this, you need the selection operation, as shown in the following SQL command and its results. * is a wildcard that matches all the columns of a table. % is a wildcard that matches any combination of characters. Note that the “table” that is operated on is actually the SQL command from step 2. The result contains only two rows:

select *

from (select job_id from jobs)

where job_id like 'AC%'

AC_ACCOUNT

AC_MGR

You can streamline this SQL command as follows. This version expresses both the projection from step 2 and the selection from step 3 using a unified syntax. Read it carefully, and make sure you understand it:

select job_id from jobs

where job_id like 'AC%'

4. You need the job title of every employee; that is, you need the job_id column from the employees table. The employees table has 107 rows, so the resulting table also has 107 rows; five of them are shown here:

select employee_id, job_id

from employees

100 AD_PRES

101 AD_VP

102 AD_VP

103 IT_PROG

104 IT_PROG

5. Next, you need the employee_id and job_id columns from the job_history table. The jobs table contains 19 rows, so this command also produces 19 rows. Only the first five rows are shown here:

select employee_id, job_id

from job_history

101 AC_ACCOUNT

200 AC_ACCOUNT

101 AC_MGR

200 AD_ASST

102 IT_PROG

6. Remember that the current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. The complete job history of any employee is therefore the union of the tables created in step 4 and step 5:

select employee_id, job_id

from employees

union

select employee_id, job_id

from job_history

7. You need to join the tables created in step 1 and step 3. The resulting table contains all possible pairings of the 107 emp_id values in the employees table with the two job_id values of interest. There are 214 such pairings, a few of which are shown next:

select *

from

(select employee_id from employees),

(select job_id from jobs where job_id like 'AC%')

100 AC_ACCOUNT

101 AC_ACCOUNT

102 AC_ACCOUNT

103 AC_ACCOUNT

104 AC_ACCOUNT

You can streamline this SQL command as follows. This version expresses the projections from step 1 and step 2, the selection from step 3, as well as the join in the current step using a unified syntax. Read it carefully, and make sure you understand it. This pattern of combining multiple projection, join, and selection operations is the most important SQL pattern, so you should make sure you

understand it. Note that you prefix the table names to the column names. Such prefixes are required whenever there are ambiguities. In this case, there is a job_id column in the employees table in addition to the jobs table:

select employees.employee_id, jobs.job_id

from employees, jobs

where jobs.job_id like 'AC%'

8. From the table created in step 7, you need to subtract the rows in the table created in step 6! To do this, you need the difference operation, the appropriate SQL keyword being minus. The resulting table contains those pairings of employee_id and job_id that are not found in the job_history table. Here is the SQL command you need. The resulting table contains exactly 211 rows, a few of

which are shown:

select employees.employee_id, jobs.job_id

from employees, jobs

where jobs.job_id like 'AC%'

minus

select employee_id, job_id

from job_history

100 AC_ACCOUNT

100 AC_MGR

102 AC_ACCOUNT

102 AC_MGR

103 AC_ACCOUNT

9. Thus far, you’ve obtained pairings of employee_id and job_id that are not found in the employee’s job history—that is, the table constructed in step 6. Any employee who participates in such a pairing is not an employee of interest; that is, any employee who participates in such a pairing isn’t an employee who has worked in all positions for which the job_id starts with the characters AC. The

first column of this table therefore contains the employees in which you’re not interested. You need another projection operation:

select employee_id from

(

select employees.employee_id, jobs.job_id

from employees, jobs

where jobs.job_id like 'AC%'

minus

select employee_id, job_id

from job_history

)

100

102

103

10. You’ve identified the employees who don’t satisfy your criteria. All you have to

do is to eliminate them from the table created in step 1! Exactly one employee

satisfies your criteria:

select employee_id

from employees

minus

select employee_id from

(

select employees.employee_id, jobs.job_id

from employees, jobs

where jobs.job_id like 'AC%'

minus

(

select employee_id, job_id

from job_history

union

select employee_id, job_id

from job_history

)

101

You had to string together 10 operations to produce the final answer: 5 projection operations, 1 selection operation, 1 union operation, and 2 difference operations.

You can express the final answer in a self-documenting way using table expressions, as shown in

Listing 1-1. In practice, you wouldn’t go to such great lengths to restrict yourself to a single relational

operation in each step, because, as you saw in step 7, multiple projection, selection, and join operations can be expressed using a unified syntax. As I said earlier, this approach of combining multiple projection, join, and selection operations is the most important SQL pattern, so make sure you understand it.

Listing 1-1. Final Answer Using Table Expressions

WITH

-- Step 1

all_employees AS

( SELECT employee_id FROM employees

-- Step 2

all_jobs AS

( SELECT job_id FROM jobs

-- Step 3

selected_jobs AS

( SELECT * FROM all_jobs WHERE job_id LIKE 'AC%'

-- Step 4

selected_pairings AS

( SELECT * FROM all_employees, selected_jobs

-- Step 5

current_job_titles AS

( SELECT employee_id, job_id FROM employees

-- Step 6

previous_job_titles AS

( SELECT employee_id, job_id FROM job_history

-- Step 7

complete_job_history AS

( SELECT * FROM current_job_titles

UNION

SELECT * FROM previous_job_titles

-- Step 8

nonexistent_pairings AS

( SELECT * FROM selected_pairings

MINUS

SELECT * FROM complete_job_history

-- Step 9

undesired_employees AS

( SELECT employee_id FROM nonexistent_pairings

)

-- Step 10

SELECT * FROM all_employees

MINUS

SELECT * FROM undesired_employees

I resume the discussion of SQL in the next chapter. For now, note how formatting improves readability—the formatted version with vertical “rivers” and capitalized “reserved words” shown in

Listing 1-1 was produced using the formatting options in SQL Developer.

Oracle Database Expert

Sunday, 12 June 2016

The Secret Sauce

No comments:

Post a Comment