Wednesday, 8 June 2016

What Is a Relational Database?

What Is a Relational Database?

Relational database theory was laid out by Codd in 1970 in a paper titled “A Relational Model for Data for Large Shared Data Banks.” His theory was meant as an alternative to the “programmer as navigator” paradigm that was prevalent in his day.

In pre-relational databases, records were chained together by pointers, as illustrated in Figure 1-6. Each chain has an owner and zero or more members. For example, all the employee records in a department could be chained to the corresponding department record in the departments table. In such a scheme, each employee record points to the next and previous records in the chain as well as to the department record. To list all the employees in a department, you would first navigate to the unique department record (typically using the direct-access technique known as hashing) and then follow the chain of employee records.

Records can participate in multiple chains; for example, all the employee records with the same job
can be chained to the corresponding job record in the jobs table. To list all the employees performing a particular job, you can navigate to the job record and then follow the chain of employee records.

This scheme was invented by Charles Bachman, who received the ACM Turing Award in 1973 for his achievement. In his Turing Award lecture, titled “The Programmer as Navigator,” Bachman enumerated seven ways in which you can navigate through such a database:

1. Records can be retrieved sequentially.
2. A specific record can be retrieved using its physical address if it’s available.
3. A specific record can be retrieved using a unique key. Either a unique index or hash addressing makes this possible.
4. Multiple records can be retrieved using a non-unique key. A non-unique index is necessary.
5. Starting from an owner record, all the records in a chain can be retrieved.
6. Starting from any member record in a chain, the prior or next record in the chain can be retrieved.
7. Starting at any member record in a chain, the owner of the chain can be retrieved.

Bachman noted, “Each of these access methods is interesting in itself, and all are very useful. However, it is the synergistic usage of the entire collection which gives the programmer great and expanded powers to come and go within a large database while accessing only those records of interest in responding to inquiries and updating the database in anticipation of future inquiries.”

In this scheme, you obviously need to know the access paths defined in the database. How else could
you list all the employees in a record or all the employees holding a particular job without retrieving every single employee record?

In 1979, Codd made the startling statement that programmers need not and should not have to be
concerned about the access paths defined in the database. The opening words of the first paper on the
relational model were, “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation)” (“A Relational Model of Data for Large Shared Data Banks”).

Productivity and ease of use were the stated goals of the relational model. In “Normalized Data Base
Structure: A Brief Tutorial” (1971), Codd said,

What is less understandable is the trend toward more and more complexity in the data structures with which application programmers and terminal users directly interact. Surely, in the choice of logical data structures that a system is to support, there is one consideration of absolutely paramount importance—and that is the convenience of the majority of users. … To make formatted data bases readily accessible to users (especially casual users) who have little or no training in programming we must provide the simplest possible data structures and almost natural language. … What could be a simpler, more universally needed, and more universally understood data structure than a table?

As IBM researcher Donald Chamberlin recalled later (The 1995 SQL Reunion: People, Projects, and Politics),

[Codd] gave a seminar and a lot of us went to listen to him. This was as I say a revelation for me because Codd had a bunch of queries that were fairly complicated queries and since I’d been studying CODASYL, I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff. Codd would sort of write them down as one-liners. These would be queries like, “Find the employees who earn more than their managers.” He just whacked them out and you could sort of read them, and they weren’t complicated at all, and I said, “Wow.” This was kind of a conversion experience for me, that I understood what the relational thing was about after that.”

Donald Chamberlin and fellow IBM researcher Raymond Boyce went on to create the first relational
query language based on Codd’s proposals and described it in a short paper titled “SEQUEL: A Structured English Query Language” (1974). The acronym SEQUEL was later shortened to SQL because—as recounted by Chamberlin in The 1995 SQL Reunion: People, Projects, and Politics—SEQUEL was a trademarked name; this means the correct pronunciation of SQL is “sequel,” not “es-que-el.”

Codd emphasized the productivity benefits of the relational model in his acceptance speech for the
1981 Turing Award:

It is well known that the growth in demands from end users for new applications is outstripping the capability of data processing departments to implement the corresponding application programs. There are two complementary approaches to attacking this problem (and both approaches are needed): one is to put end users into direct touch with the information stored in computers; the other is to increase the productivity of data processing professionals in the development of application programs. It is less well known that a single technology, relational database management, provides a practical foundation to both approaches.

In fact, the ubiquitous data-access language SQL was intended for the use of non-programmers.
As explained by the creators of SQL in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners. [emphasis added] It’s for this class of users that SEQUEL is intended.”

No comments:

Post a Comment