The Secret Sauce
Codd’s secret was relational algebra, a collection of operations that could be used to combine tables. Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like selection, projection, union, difference, and join (more precisely, Cartesian join), listed in Table 1-1.
Why did Codd name this relational algebra? Codd based his theory on rigorous mathematical principles and used the esoteric mathematical term relation to denote what is loosely referred to as a table. I’m now ready to define what I mean by a relational database:
A relational database is a database in which: The data is perceived by the user as tables (and nothing but tables) and the operators available to the user for (for example) retrieval are operators that derive “new” tables from “old” ones.1
Examples of Relational Operations
Let’s use the five operations defined in Table 1-1 to answer this question: “Which employees have worked in all accounting positions—that is, those for which the job_id starts with the characters AC?” The current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. The list of job titles is held in the jobs table.
Given these three tables, you can execute the following steps to answer the business question that has
been posed:
1. This step uses only the employee_id column from the employees table. To do this, you need the projection operation. Following are the SQL command and its results. The employees table contains 107 rows, so this command also produces 107 rows. Only the first five rows are shown here:
select employee_id from employees
100
101
102
103
104
Note that certain formatting aspects of SQL statements, such as lowercase, uppercase, white space, and line feeds, are immaterial except in the case of string literals—that is, strings of characters enclosed within quote marks.
2. This step uses only the job_id column from the jobs table. To obtain this, you need the projection operation again. Following are the SQL command and its results. The jobs table contains 19 rows, so this command also produces 19 rows. Only the first five rows are shown here:
select job_id from jobs
AC_ACCOUNT
AC_MGR
AD_ASST
AD_PRES
AD_VP
3. Remember that the result of any relational operation is always another table. You need a subset of rows from the table created by the projection operation used in step 2. To obtain this, you need the selection operation, as shown in the following SQL command and its results. * is a wildcard that matches all the columns of a table. % is a wildcard that matches any combination of characters. Note that the “table” that is operated on is actually the SQL command from step 2. The result contains only two rows:
select *
from (select job_id from jobs)
where job_id like 'AC%'
AC_ACCOUNT
AC_MGR
You can streamline this SQL command as follows. This version expresses both the projection from step 2 and the selection from step 3 using a unified syntax. Read it carefully, and make sure you understand it:
select job_id from jobs
where job_id like 'AC%'
4. You need the job title of every employee; that is, you need the job_id column from the employees table. The employees table has 107 rows, so the resulting table also has 107 rows; five of them are shown here:
select employee_id, job_id
from employees
100 AD_PRES
101 AD_VP
102 AD_VP
103 IT_PROG
104 IT_PROG
5. Next, you need the employee_id and job_id columns from the job_history table. The jobs table contains 19 rows, so this command also produces 19 rows. Only the first five rows are shown here:
select employee_id, job_id
from job_history
101 AC_ACCOUNT
200 AC_ACCOUNT
101 AC_MGR
200 AD_ASST
102 IT_PROG
6. Remember that the current job of each employee is stored in the job_id column of the employees table. Any previous jobs are held in the job_history table. The complete job history of any employee is therefore the union of the tables created in step 4 and step 5:
select employee_id, job_id
from employees
union
select employee_id, job_id
from job_history
7. You need to join the tables created in step 1 and step 3. The resulting table contains all possible pairings of the 107 emp_id values in the employees table with the two job_id values of interest. There are 214 such pairings, a few of which are shown next:
select *
from
(select employee_id from employees),
(select job_id from jobs where job_id like 'AC%')
100 AC_ACCOUNT
101 AC_ACCOUNT
102 AC_ACCOUNT
103 AC_ACCOUNT
104 AC_ACCOUNT
You can streamline this SQL command as follows. This version expresses the projections from step 1 and step 2, the selection from step 3, as well as the join in the current step using a unified syntax. Read it carefully, and make sure you understand it. This pattern of combining multiple projection, join, and selection operations is the most important SQL pattern, so you should make sure you
understand it. Note that you prefix the table names to the column names. Such prefixes are required whenever there are ambiguities. In this case, there is a job_id column in the employees table in addition to the jobs table:
select employees.employee_id, jobs.job_id
from employees, jobs
where jobs.job_id like 'AC%'
8. From the table created in step 7, you need to subtract the rows in the table created in step 6! To do this, you need the difference operation, the appropriate SQL keyword being minus. The resulting table contains those pairings of employee_id and job_id that are not found in the job_history table. Here is the SQL command you need. The resulting table contains exactly 211 rows, a few of
which are shown:
select employees.employee_id, jobs.job_id
from employees, jobs
where jobs.job_id like 'AC%'
minus
select employee_id, job_id
from job_history
100 AC_ACCOUNT
100 AC_MGR
102 AC_ACCOUNT
102 AC_MGR
103 AC_ACCOUNT
9. Thus far, you’ve obtained pairings of employee_id and job_id that are not found in the employee’s job history—that is, the table constructed in step 6. Any employee who participates in such a pairing is not an employee of interest; that is, any employee who participates in such a pairing isn’t an employee who has worked in all positions for which the job_id starts with the characters AC. The
first column of this table therefore contains the employees in which you’re not interested. You need another projection operation:
select employee_id from
(
select employees.employee_id, jobs.job_id
from employees, jobs
where jobs.job_id like 'AC%'
minus
select employee_id, job_id
from job_history
)
100
100
102
102
103
10. You’ve identified the employees who don’t satisfy your criteria. All you have to
do is to eliminate them from the table created in step 1! Exactly one employee
satisfies your criteria:
select employee_id
from employees
minus
select employee_id from
(
select employees.employee_id, jobs.job_id
from employees, jobs
where jobs.job_id like 'AC%'
minus
(
select employee_id, job_id
from job_history
union
select employee_id, job_id
from job_history
)
)
101
You had to string together 10 operations to produce the final answer: 5 projection operations, 1 selection operation, 1 union operation, and 2 difference operations.
You can express the final answer in a self-documenting way using table expressions, as shown in
Listing 1-1. In practice, you wouldn’t go to such great lengths to restrict yourself to a single relational
operation in each step, because, as you saw in step 7, multiple projection, selection, and join operations can be expressed using a unified syntax. As I said earlier, this approach of combining multiple projection, join, and selection operations is the most important SQL pattern, so make sure you understand it.
Listing 1-1. Final Answer Using Table Expressions
WITH
-- Step 1
all_employees AS
( SELECT employee_id FROM employees
),
-- Step 2
all_jobs AS
( SELECT job_id FROM jobs
),
-- Step 3
selected_jobs AS
( SELECT * FROM all_jobs WHERE job_id LIKE 'AC%'
),
-- Step 4
selected_pairings AS
( SELECT * FROM all_employees, selected_jobs
),
-- Step 5
current_job_titles AS
( SELECT employee_id, job_id FROM employees
),
-- Step 6
previous_job_titles AS
( SELECT employee_id, job_id FROM job_history
),
-- Step 7
complete_job_history AS
( SELECT * FROM current_job_titles
UNION
SELECT * FROM previous_job_titles
),
-- Step 8
nonexistent_pairings AS
( SELECT * FROM selected_pairings
MINUS
SELECT * FROM complete_job_history
),
-- Step 9
undesired_employees AS
( SELECT employee_id FROM nonexistent_pairings
)
-- Step 10
SELECT * FROM all_employees
MINUS
SELECT * FROM undesired_employees
I resume the discussion of SQL in the next chapter. For now, note how formatting improves readability—the formatted version with vertical “rivers” and capitalized “reserved words” shown in
Listing 1-1 was produced using the formatting options in SQL Developer.
No comments:
Post a Comment