Datalog Basis

There is a logic based data model called Datalog. The mathematical model of Datalog is essentially that of the Relational Model [22,15]. Predicate symbols denote relations in terms of their mathematical definition, as a set of lists. A component in the tuple (a reference to the column) can be identified by its position only.

A Datalog program is built from atomic formulas. Such a formula consists of a predicate symbol with a list of arguments. The predicate symbol corresponds to a relation. The arguments refer to domains. For example, if $p(A,B,C)$ is a predicate, then $B$ refers to the second component of some tuple in the relation corresponding to $p$.

Arity of such a predicate is defined as number of its arguments, which corresponds to the list of domains for the Relational Model. In the above example, arity is equal $3$. A particular predicate is identified by a pair: its name and arity which is denoted as: $predicate\_name/arity$. For the above example it is: $p/3$.

Arguments of the atomic formula can be either variables or constants. Variables are denoted as names starting with an upper case letter, constants are numbers or names starting with a lower case. The formula is a relation of its predicate restricted by:

  1. Selecting for equality between a constant and the component or components in which that constant appears, and
  2. Selecting for equality between components that have the same variable.

For example:

\begin{displaymath}
customers(joe,Address,Balance)
\end{displaymath}

where $joe$ is a constant, and $Address$ and $Balance$ are variables. It denotes relational selection (see Definition 3.6):

\begin{displaymath}
\sigma_{\$1=joe}(CUSTOMERS)
\end{displaymath}

And the following formula:

\begin{displaymath}
includes(X,Item,X)
\end{displaymath}

where $X$, $Item$, and $X$ are variables, denotes:

\begin{displaymath}
\sigma_{\$1=\$3}(INCLUDES)
\end{displaymath}

There is one important distinction between the Relational Model and the Datalog Model. Datalog allows to define relations in two ways: as Extensional Knowledge, or Intensional Knowledge. The Relational Model is capable of expressing Extensional Knowledge only. Extensional Knowledge are bare facts, Intensional Knowledge are rules, which define new facts using relationships among existing facts. Knowledge in Datalog is expressed by a subset of Horn Clauses.

Igor Wojnicki 2005-11-07