Relational algebra operations

Operations in the Relational Data Model are defined by Relational Algebra. Relational Algebra is not based on the attributes, but rather on the order of values (components) in the tuple. There are five basic operations [7]:

  1. Union,
  2. Difference,
  3. Cartesian Product,
  4. Projection,
  5. Selection.

The Union of relations $R$ and $S$ is defined as:

R \cup S = \{ \langle x_1,\ldots,x_n \rangle : \langle x_1,\...
..._n \rangle \in R \vee \langle x_1,\ldots,x_n \rangle \in S \}.
\end{displaymath} (3.2)

It is the set of tuples that are in $R$ or $S$. This operation may be applied to the relations of the same arity only, so all the tuples in the result have the same number of components (values).

The Difference of relations $R$ and $S$, defined as:

R - S = \{ \langle x_1,\ldots,x_n \rangle \in R : \langle x_1,\ldots,x_n\rangle \notin S \}.
\end{displaymath} (3.3)

It is the set of tuples which are in $R$ but not in $S$. It is required that $R$ and $S$ have the same arity.

The Cartesian Product of relations $R$ and $S$ is defined as:

R \times S =
\{ \langle x_1,\ldots,x_n,y_1,\ldots,y_m \rang...
... \rangle \in R \wedge \langle y_1,\ldots,y_m \rangle \in S \}.
\end{displaymath} (3.4)

Assume that $R$ has arity $n$ and $S$ has arity $m$. $R \times S$ is defined as the set of all possible ($n+m$)-tuples which first $n$ components form a tuple in $R$ and last $m$ form a tuple in $S$. The result has the arity equal to $n+m$.

The Projection is denoted as $\pi$. Assuming that there is a relation $R$ of arity $k$, then:

\end{displaymath} (3.5)

It denotes the projection of $R$ onto components $i_1,i_2,\ldots,i_m$, where $i_j$ are distinct integers in range $1,\ldots,k$. That is the set of $m$-tuples $\langle a_1,a_2,\ldots,a_m \rangle$ such that there is some $k$-tuple $\langle b_1,b_2,\ldots,b_k \rangle$ in $R$ for which $a_j=b_{i_j}$ for $j=1,2,\ldots,m$.

For example $\pi_{4,2}(R)$ is computed by taking each tuple $t$ from $R$, and forming a 2-tuple set from fourth and second component of $t$. Instead of using locations, attribute names can be used as well. Assuming that $R(A,B,C,D)$ is the relation schema for the above relation. Then the above projection can also be denoted as $\pi_{D,B}(R)$. The resulting relation $Q$ can be described by the following relation schema: $Q(D,B)$.

The Selection is denoted as:

\end{displaymath} (3.6)

Where $R$ is a relation and $F$ is a formula involving: Then $\sigma_F(R)$ is the set of tuples $t$ in $R$ such that, when for all $i$ the $i$-th component of $t$ is substituted for any occurrences of $\$i$ in formula $F$, the formula $F$ becomes true. As for projection if a relation has named columns, then the formula in the selection can refer to columns by name instead of by number. The arity of $\sigma_F(R)$ is the same as the arity of $R$.

There are also some other algebraic operations defined as: Intersection, Quotient, Join, Natural Join or Semijoin. They are all derived from the basic ones.

The Intersection can be defined as:

R \cap S = R - (R - S).
\end{displaymath} (3.7)

Thus its functionality is provided by difference, that is why it is not considered to be recognized as a basic operation.

The Quotient of relations $R$ and $S$ is denoted as $R \div S$. Assuming that $R$ and $S$ are relations of arity $r$ and $s$ respectively, and $r>s$, and $S \neq \emptyset$. Then the quotient is a set of all ($r-s$)-tuples $\langle a_1,a_2,\ldots,a_{r-s} \rangle$, such that for all $s$-tuples $\langle a_{r-s+1},\ldots,a_r \rangle$ in $S$, the tuple $\langle a_1,a_2,\ldots,a_r \rangle$ is in $R$:

R \div S = \pi_{1,2,\ldots,r-s}(R) - \pi_{1,2,\ldots,r-s}((\pi_{1,2,\ldots,r-s}(R) \times S) - R).
\end{displaymath} (3.8)

The Join (also known as $\theta$-Join) of $R$ and $S$ on columns $i$ and $j$ is defined as:

R \Join_{i \theta j} S = \sigma_{\$i \theta \$(r+j)}(R \times S).
\end{displaymath} (3.9)

Where $\theta$ is a comparison operator which complies with the definition 3.6, and $r$ is the arity of $R$. In other words, these are all the tuples in the Cartesian product of $R$ and $S$ such that the $i$-th component of $R$ stands in relation $\theta$ to the $j$th component of $S$. If $\theta$ is '$=$' then the operation is called the $Equijoin$.

The Natural Join is a special case of the Join. It is applicable only when both $R$ and $S$ have columns that are named by attributes. Assuming that $\langle A_1,A_2,\ldots,A_k \rangle$ are all the attribute names used for both $R$ and $S$, then the Natural Join is defined as:

R \Join S = \pi_{i_1,i_2,\ldots,i_m} \sigma_{R.A_1 = S.A_1 \wedge \ldots \wedge R.A_k=S.A_k}(R \times S).
\end{displaymath} (3.10)

such that $i_1,\ldots,i_m$ is the list of all components of $R \times S$, in order, except the components $S.A_1,\ldots,S.A_k$.

The Semijoin of relation $R$ by relation $S$, written $R \,\rhd\!\!\!\!< S$, is the projection onto the attributes of $R$ of the natural join of $R$ and $S$:

R \,\rhd\!\!\!<S = \pi_R(R \Join S)
\end{displaymath} (3.11)

where $_R$ stands for the list of attributes of $R$.

Relational Algebra becomes a natural way to formulate queries. Having selection and projection, a question about a single relation can be asked. Adding Cartesian product, the query can concern more than one relation. Applying union or difference, several queries can be combined, to form a more sophisticated one.

Igor Wojnicki 2005-11-07