Browsing a tree structure

The first problem regards finding relationships in a tree structure. This is the class C1 problem (see Section 2.2). There is a content hierarchy described as a tree, containing the set of nodes (the example is based on [16]). It is visualized in Figure 12.1.

The tree is represented by the root node: Library, and its child nodes: News, Sports, etc. The nodes are kept in the RDBMS as subject relation in the form given in Table 12.1. Parent_id and Item_id columns represent the parent-child relationship using Item_id of the node as Parent_id of its children. The root node: Library, has no real parent (it is denoted by -1 as the Parent_id value). Each node can be considered a child, as well as a parent, which implies an inherent cyclical nature of the data.

**Figure 12.1:** A tree structure.
$\includegraphics[width=0.5\textwidth]{pic/tree}$

Parent_id	Item_id	Item_Name
-1	1	Library
1	2	News
2	3	World News
2	4	Politics
2	5	Business
2	6	Science
2	7	Technology
1	8	Sports
8	9	Local
8	10	Collegiate
8	11	Professional
9	12	Soccer
9	15	Football
10	13	Soccer
10	16	Football
11	14	Soccer
11	17	Football

Finding all the descendants (children and children's children, and so on), or all the ascendants (parents and parents' parents, and so on) of a given node may be solved recursively. This problem is defined formally in Section 2.3.

A Jelly View can be constructed which will traverse such a tree structure. The External Matching states a relationship between the Jelly View name (actually the name and arity is considered here) and the Prolog predicate which will be used as a goal to generate the contents of the view. For this purpose appropriate records in the External Matching relations have to be added (all relations mentioned below refer to Figure 11.1 and Table 11.2). Furthermore, the Internal Matching and Logic Program have to be stated. For complete SQL code refer to Figure 12.3.

**Figure 12.2:** Browsing a tree, Prolog Program.
$\begin{figure}\begin{verbatim}find(Parent,Child):-tree(Parent,Child,_). find(Parent,Child):-tree(Parent,C1,_), find(C1,Child).\end{verbatim} \end{figure}$

**Figure 12.3:** Browsing a tree, SQL statements.
$\begin{figure}\begin{verbatim}-- ReDaReS data -- Ext.matching INSERT INTO rat... ...VALUES (1,1,10); INSERT INTO ratpcl VALUES (1,3,20);\end{verbatim} \end{figure}$

The first query adds a record which establishes a relationship between the Jelly View: find and the Logic Program predicate: find (section: Ext.Matching of Figure 12.3). This predicate will be used as the goal to provide inferred records. The next two entries add information about the Jelly View attributes: their names and types. These entries will allow to create a relation which is named find and has two integer arguments: parent_id and child_id.

Next query enables access to the subject relation from the Logic Program (section Int.Matching of Figure 12.3); it establishes the Internal Matching for the Jelly View. As a result any reference to predicate tree/3, in the Logic Program, will be translated into a reference to the relation subject.

The decomposed Logic Program, which provides find predicate is added in section Logic program. The original program, in Prolog Language, is given in Figure 12.2.

The last section (Supporting) adds records creating many-to-many relationships among the External Matching, Internal Matching and Logic Program relations.

If the user issues any SQL query which addresses find Jelly View, the system takes over. It initiates the inference engine and provides inferred data. Additional arguments can be passed to the Jelly View i.e. to prevent combinatorial explosion of inferred data. The Jelly View behaves as a function which is used as a data-source (a relation). An argument is just a SQL statement which should return a single column data.

In order to find all successors of the node number 8 (for data reference see Figure 12.1 and Table 12.1) the following query should be issued (note that the restricting child_id number is given as a SQL statement: ``SELECT 8''; the reply for this query is passed to the inference engine):

child_id	item_name
9	Local
10	Collegiate
11	Professional
12	Soccer
15	Football
13	Soccer
16	Football
14	Soccer
17	Football

The inference engine is not bounded by any arguments in this case . It generates all possible solutions and then the RDBMS selects the proper ones. Instead of 9 records, the inference engine generates 53 records. It is not a combinatorial explosion yet, however, for more complex trees definitely it will be.

Exactly the same Jelly View is capable of handling queries about the predecessors. A query: ``find all predecessors of the node number 15'' is as follows:

This time the inference engine is bounded by 15 passed as the second argument to the Jelly View. It will find all predecessors of the node number 15. The system replies:

As it was mentioned before, the argument of a Jelly View has to be a valid SQL SELECT statement, returning a single column. A multi row value can be used as well, for example: ``find all predecessors of nodes 10 or 4'' can be expressed as follows:

In order to check the overall performance of the system, the same Jelly View is used to browse a larger tree structure. The contents of the subject relation is exchanged with a new tree. It has three children at each node and it is twelve level deep. The number of nodes is given as a Sum of Geometric Series. Having the following series:

To have a tree structure with three children at each node and twelve level deep, the following assumptions have to be stated:

, so number of nodes in the tree is equal:

parent_id	child_id
-1	265719
0	265719
3	265719
12	265719
39	265719
120	265719
363	265719
1092	265719
3279	265719
9840	265719
29523	265719
88572	265719

The response is given in 13.5 seconds. Assuming that the tree structure has 265720 nodes, it is a satisfying result. The experiment was conducted using Mobile Intel Celeron CPU 1.50GHz, Linux 2.4.21, PostgreSQL 7.2.1, Unix ODBC 2.1.1, SWI-Prolog 5.0.10. The following timings have been achieved:

Where real is the total execution time, user is the CPU time allocated to the process, sys is the OS overhead (interprocess communication, I/O, library calls etc.). Furthermore:

is the RDBMS overhead^12.1.

The database overhead becomes significant if the system generates larger amounts of records. It is expected, because the database is fed with all records generated by the inference engine. The records have to be inserted into a temporary table before the user query can be completed. So, the system throughput is limited by the RDBMS throughput.

Timings for different size of the output data are gathered in Table 12.2. In this case the query is: ``find all successors of the given node''. The given node is denoted as parent_id, the number of rows that are returned as a reply is denoted as # records. The query for the first row of the Table 12.2 is:

`parent_id`	`# records`	`real`	`user`	`sys`
29523	12	9.014	6.250	0.610
3279	120	9.098	6.180	0.660
363	1092	9.678	6.400	0.680
39	9840	14.750	7.640	1.080
3	88572	1:01.346	18.770	4.020
0	265720	2:44.109	43.880	10.850

Graph traversal problems can also be easily solved. Example Prolog programs of such applications are given in [4].