SQL

SQL (Structured Query Language) [22] has evolved as a high level, robust and compact DDL (Data Definition Language) and DML (Data Manipulation Language, for DDL and DML see Chapter 3). It was formerly known as SEQUEL which was developed by IBM in San Jose, CA, originally for use in the experimental database system called System R [22]. The language is now used by majority of database systems.

SQL standards are stated faster than implementations. There are three major dialects standardized by ANSI: SQL, SQL2 (also known as SQL-92) and SQL3 (known as SQL-99) [7]. In most cases a RDBMS (Relational Database Management System) complies with some subset of the standard. On the other hand, different ``out-of-standard'' features are often incorporated in RDMBS, extending - in some, vendor specific, way - their functionality.

The DML part of SQL is divided into the following categories [22,7]:

The most significant is Selection. There are three keywords: SELECT, FROM, WHERE, which comply with the following syntax:

  SELECT L
    FROM R
    WHERE C;

The above statement corresponds to the relational algebra expression (see Section 3.2):

\begin{displaymath}
\pi_L(\sigma_C(R)),
\end{displaymath}

where $\sigma$ denotes selection and $\pi$ is projection. It can be read as: ``Select tuples of R which satisfy condition C and list the values of attributes given by L for the selected objects''. Actually $R$ may stand for multiple relations; in such a case a Cartesian product will be considered. $L$ is a list of attributes and $C$ is a logical expression (for details on relations and attributes refer to Chapter 3). The selection query covers the projection, selection, and Cartesian product of relational algebra. A semicolon denotes the end-of-query. As a result a set of tuples is generated.

Extending the selection query, functionality of intersection, difference, and union can be achieved [22].

In SQL one query can be used as a part of another query. Such a query is called a subquery. Subqueries can have subqueries and so on, down to as many levels as necessary. There are three ways subqueries can be used:

  1. A subquery can return a single constant, and this constant can be compared with another value in a WHERE clause.
  2. A subquery can return a relation that can be used in various ways in a WHERE clause.
  3. A subquery can return a relation that appears in a FROM clause.

In SQL, there is an ability to partition the tuples of a relation into ``groups'', based on the values of tuples in one or more attributes. Then other columns of the relation may be aggregated by applying ``aggregation'' operators to those columns. The aggregation is performed separately for each group. This functionality is provided by GROUP BY clause and aggregation operators in SELECT clauses.

In addition there is a way to filter out the tuples which match certain values of the aggregation operators. It is provided by HAVING clause. There are five basic aggregation operators: SUM, AVG, MIN, MAX, and COUNT, which calculate the sum, average, minimum value, maximum value and number of tuples in the group.

There is an important difference between WHERE and HAVING clauses. WHERE applies to the tuples which are being selected, while HAVING applies to groups of tuples.

There are also other queries that do not return values but modify the contents of the database. They fall into three categories:

  1. Insert tuples into a relation.
  2. Delete some tuples from a relation.
  3. Update values of certain components of certain tuples.
The Insertion is provided by INSERT clause, which has the following syntax:

\begin{displaymath}
\mathtt{INSERT \; INTO} \; R(A_1,\ldots,A_n) \; \mathtt{VALUES} \; (v_1,\ldots,v_n);
\end{displaymath}

Where $R$ is a relation, $A_m$ ($m=1 \ldots n$) are chosen attributes, and $v_p$ ($p=1 \ldots n$) are values of the corresponding attributes. Instead of VALUES clause, there may be a subquery used:

\begin{displaymath}
\mathtt{INSERT \; INTO} \; R(A_1,\ldots,A_n) \; \mathtt{SELECT} \; B_1,\ldots,B_n \; \mathtt{FROM} \; P;
\end{displaymath}

As the result values of the attributes $B_1,\ldots,B_n$ from all tuples taken from $P$ become tuples in $R$. The SELECT subquery may have also other clauses which comply with the clause syntax like: WHERE, GROUP BY, HAVING etc., or other subqueries, as well.

Removing a tuple from some given relation is provided by DELETE clause, which is defined as:

\begin{displaymath}
\mathtt{DELETE \; FROM} \; R \; \mathtt{WHERE} \; \langle condition \rangle;
\end{displaymath}

It removes all tuples from the relation $R$ which meet given $condition$. The clause WHERE has exactly the same syntax as for SELECT clause.

Finally, updating some tuples is provided by UPDATE, which has the following syntax:

\begin{displaymath}
\mathtt{UPDATE} \; R \; \mathtt{SET} \; \langle new \, value...
...nment \rangle \; \mathtt{WHERE} \; \langle condition \rangle ;
\end{displaymath}

Where $R$ is a relation to update, $\langle new \, value \, assignment \rangle$ is a comma separated list of triples: an attribute, equality sign, and a formula which generates a new value. WHERE is a regular WHERE clause.

As it was mentioned at the beginning of this section, SQL provides also the DDL functionality. Thus, this portion of SQL involves describing the structure of information in the database. In order to declare a relation schema, which is usually called a table in terms of SQL, the principal atomic data types should be known. There are basic six categories [22]:

Regarding the Data Definition functionality there are three basic operations provided: Creation of a new relation is handled by CREATE TABLE clause, which is defined as follows:

\begin{displaymath}
\mathtt{CREATE \; TABLE} \; R \; ( \langle list \, of \, attributes \rangle );
\end{displaymath}

where $R$ is the name of the new relation, and $\langle list \, of \, attributes \rangle $ is a comma separated list of pairs: attribute name, and its data type. In order to remove a relation there is DROP clause, which has the following syntax:

\begin{displaymath}
\mathtt{DROP \; TABLE \;} R;
\end{displaymath}

It immediately removes relation $R$ from the database, along with all its tuples.

Modifying a relation schema is significantly more often then removing it. The modification means adding or removing some attributes. It is provided by ALTER TABLE statement which is defined as follows:

\begin{displaymath}
\mathtt{ALTER \; TABLE} \; R \; \mathtt{ADD} \; attrib \; type;
\end{displaymath}

or

\begin{displaymath}
\mathtt{ALTER \; TABLE} \; R \; \mathtt{DROP} \; attrib ;
\end{displaymath}

The first statement adds the attribute $attrib$ of $type$ to the relation $R$. The second statement removes the attribute $attrib$ from the relation $R$.

There is another class of SQL relations - they are called views. A regular relation actually exists in the database, while a view is defined by an expression much like a query. Views in turn, can be queried as regular relations, so as they existed physically, and in some cases they can even be modified. The simplest form of a view definition is:

\begin{displaymath}
\mathtt{CREATE \; VIEW} \; name \; \mathtt{AS} \; \langle view \, definition \rangle ;
\end{displaymath}

It creates a view named $name$ according to $\langle view \, definition \rangle$ which is a SELECT statement. Any reference to $name$ relation concerns the selected relations. If the names of the view attributes are omitted, the particular view will have attributes named after the ones returned by the SELECT statement. A state of view is created on demand and does not exist prior to the execution of a query which refers to it.

The syntax and functionality described above does not fully cover SQL. Some more details on this subject may be found in [7,22,6,23].

To summarize SQL covers the core Relational Model operations (see Chapter 3). All these operations require that the number of relations is stated a priori. So it is not possible to parameterize it. In general, problems which involve recursive processing are not addressable.

Moreover, the model indicates that all the data is atomic. It does not allow to introduce more complex data structures, which limits possible data processing capabilities2.1.

Igor Wojnicki 2005-11-07