Data Structures 1. Multi-value Functions 2. Lists and Streams 3. Descriptors 4. Trees

1 SEQUENCES

In the preceding part the basic language facilities have been covered. Based on this knowledge we are now able to examine examples of specific applications and to discuss advanced programming techniques.

1.1 What are sequences?

Sequences are uni-directional lists. Uni-directional lists are widely used in many functional and logical programming languages, because of their interesting properties.

We will use the term sequence to distinguish uni-directional lists from the bi-directional lists, described in Chapter 8. For both kinds of lists, each element, except the last one, has a successor. That means that it is easy to go from one element to the next. In a bi-directional list the predecessor operation is also defined: that makes it equally simple to move from an element to its preceding element. However, for a uni-directional list the predecessor function is not available: the list can only be traversed in forward direction.

These properties are not only important for the ways a list can be traversed, but also for other operations. For example, adding an element to the front of a list is for both kinds of lists easy, but adding an element to the end of a uni-directional list is not so easy and, furthermore, it is time-consuming, while it is simple and efficient for a bi-directional list. The same applies to insertion and deletion of an element somewhere in the list.

On the other hand, uni-directional lists have a number of interesting properties which make them suitable for certain tasks. In many cases, definitions of complex algorithms using uni-directional lists are often very short and compact. Another property is that many operations on uni-directional lists are mathematical functions. They are not changing the input lists nor have other side-effects; they treat sequences as immutable objects. The result of such a function is determined solely by the values of its arguments, so, a function called with the same arguments will always yield the same result. This makes reasoning about programs easier.

It depends on the application which of the two kinds of lists should be used. In some cases, we may even need a combination of both kinds of lists. As usual, it is always a question of trade-off where to apply what.

1.2 Representation of Sequences

A sequence is an ordered set of elements. A sequence can be represented as a binary tree of terms. A sequence itself is also a term, which may be composed of other terms, and so on. That means that all the properties and operations defined on terms are also applicable to sequences. Sequences are in fact made of a special selection of terms.

All elements of a given sequence may have the same type: one may have a sequence of integers, a sequence of reals, a sequence of symbols, a sequence of sequences, or a sequence of any other type of data items. The elements of a sequence may also be of different types. For simplicity, we will assume that elements are terms, although other types are also possible.

We start with a declaration that the elements of a sequence and the sequence itself are all terms. That means:

```type element = term;
type sequence = term;```

The nodes of a sequence are represented as terms defined by the following term-signature:

`element & sequence => sequence;`

The end of a sequence is marked by a nil value. The constant nil is defined as follows:

`nil = null (sequence);`

The null (sequence) expression represents a null value of the given type. A null value means that there is no reference to an object.

This is all we need to define sequences. Note that we used only existing language elements.

Let us now use an example. We assume that the following symbols are defined:

`Albert, John, Peter, Harry => element; `

We can now create a sequence in the following way:

`Men = Albert & (John & (Peter & (Harry & nil))); `

The reason parentheses are used is because we need a specific evaluation order to obtain a sequence. This becomes clear if we represent the result as a tree (see Figure 1-1).

```       Harry  nil
\ /
Peter  &
\ /
John  &
\ /
Albert  &
\ /
&
/```

Figure 1-1: Example of a sequence tree

This is a normal representation of a term tree as we have seen before. If we are now turning the tree 45 degrees clockwise and are also showing the direction of the branches we get the following picture (see Figure 1-2). The result is a sequence of names in a uni-directional list.

```       Albert  John    Peter   Harry
/       /       /       /
----> & ----> & ----> & ----> & ----> nil```

Figure 1-2: Example of a sequence

In general, a sequence consists of a number of nodes. The functor of each node is the & operator. One branch of a node refers to an element of the sequence. The other branch refers to the next node of the sequence. The last node of the sequence is a leaf node represented by the nil value.

After having described how sequences are represented we may now define some operations on them.

The first element of a sequence is also called the head of a sequence. To obtain the head of a sequence the following definition can be used:

```head (sequence) -> element;
head (X & Y) = X;```

Suppose we are applying this to our sequence of Men:

`head (Men)?`

This is equivalent to:

`head (Albert & (John & (Peter & (Harry & nil))))?`

and because of the definition of head this query will return:

`Albert`

The head definition in its current form is only defined for non-empty sequences. It is undefined for empty sequences and for other kinds of terms. A function which is not defined for all possible arguments, is called a partial function. A function which is defined for all the possible arguments, is called a total function.

In general, partial functions will fail if they are executed with undefined argument values. In Section 13.4, "Pattern Matching", we discussed several examples of partial functions and we used the optional quantifier to cope with undefined arguments. In this chapter, we will use the implicit single quantifier for partial functions to show that such an approach is perfectly usable, as long as the users of such a function are aware of its limitations.

Exercise:

• 1.1 Are the following functions partial or total functions?
```abs (real) -> real;
sqrt (real)-> real;
cos (real) -> real;
ln (real)  -> real;```
• 1.2 Define a total function for head.

Another interesting operation on sequences is to obtain the tail of a sequence. This is defined for non-empty sequences as:

```tail (sequence) -> sequence;
tail (X & Y) = Y;```

The tail function is also a partial function. We use this function in the following example:

`tail (Men)?`

It results is:

`John & (Peter & (Harry & nil)); `

With a combination of head and tail operations it is possible to obtain also the second, third, etc., element of a sequence. For example, the second element of a sequence S is obtained by:

`head (tail (S))`

Exercise:

• 1.3 Give an expression which obtains the fourth element of a sequence. Apply this to the example of Men and verify the answer.

In general, the following equation holds for a non-empty sequence S:

S = head (S) & tail (S)

This equation describes the relationship between the three operations we discussed so far. It says that the head of a sequence S concatenated with the tail of the sequence S is equal to the original sequence S.

1.4 The length of a sequence

Sometimes it is useful to know how many elements there are in a sequence. Therefore we will define a length function which counts the number of elements. Although it is a simple function, it has a number of characteristics in common with many other functions defined on sequences. Because of that we will discuss three versions of the length function in detail.

One method to find the length of a sequence is given by the following description:

• if a sequence is empty its length is zero.
• if a sequence is not empty then its length is one plus the length of its tail.

This kind of formulation is typical for many operations on sequences. It consists of (at least) two parts. The first part says what the operation means if a sequence is empty. The second part defines what the meaning of the operation for the head and the tail of a sequence. Often, the same operation is recursively applied to the tail of the sequence.

Such a formulation can immediately be translated into a corresponding definition. In this case it becomes:

```length (sequence) -> integer;
length (S) = if S == nil then 0 else 1 + length (tail (S));```

If we apply this to our example of Men we get the following evaluation:

```length (Men) = length (Albert & (John & (Peter & (Harry & nil))))
= 1 + length (John & (Peter & (Harry & nil)))
= 1 + (1 + length (Peter & (Harry & nil)))
= 1 + (1 + (1 + length (Harry & nil)))
= 1 + (1 + (1 + (1 + length (nil))))
= 1 + (1 + (1 + (1 + 0)))
= 1 + (1 + (1 + 1))
= 1 + (1 + 2)
= 1 + 3
= 4```

During the evaluation of such a recursive definition the real computation is postponed until the end of the sequence has been encountered, which is similar to what we have seen earlier with the recursive evaluation of the factorial function in "Language Description", Chapter 3.

Our length definition can also be written in another way if we make use of pattern matching:

```length (sequence) -> integer;
length (=nil) = 0;
length (Head & Tail) = 1 + length (Tail);```

This definition is equivalent to the first definition. The length of a sequence is determined by both definitions in exactly the same way. The difference is only a question of notation.

Note that the ordering of rules is important in this version. The order of execution should be the same as in the first version: first a test if the sequence is an empty sequence before pattern matching may occur.

Exercise:

• 1.4 Is the second version of length a partial or a total function?

A recursive formulation of a definition often leads to a succinct and simple description. However, depending on the implementation, the postponed evaluation of the computation will often require a certain price in terms of memory occupation for the intermediate and unevaluated results, and will also require additional execution time for the recursive calls of the definition.

These kinds of considerations often lead to another approach for these kinds of problems. Instead of a recursive formulation an iterative method is used, where the necessary computation is immediately performed. An iterative solution for our length operation is given in the following:

```length (sequence) -> integer;
length (S) = [ counter:=0; Tail:=S;
repeat [ if Tail == nil then ready;
counter:=counter + 1;
Tail:=argument (Tail, 2) ];
counter ];```

This definition contains a loop which starts at the beginning of a sequence and maintains a running counter of the number of elements until the end of the sequence it met.

Exercise:

• 1.5 Is the third version of length a partial or a total function?

In following examples of operations on sequences we will mainly use the second approach of formulating a solution because it better reflects in many cases the verbal description how to solve a given problem.

1.5 Membership

Another interesting definition is the member function. This function tests if an argument is an element of a sequence. Its definition is:

```member (element, sequence) -> boolean;
member (X, =nil)        = false;
member (X, X & Tail)    = true;
member (X, Head & Tail) = member (X, Tail);```

This definition is based on the following considerations:

• If the end of a sequence is reached, then X is not a member of the sequence.
• If X is equal to the head of a sequence then X is a member of the sequence.
• If both previous conditions are not true, search the tail of the sequence for membership.

Note that also here the order of the definition rules is important!

Let us try to use this definition in the following example:

`member (Peter, Men)?`

which is equivalent to:

`member (Peter, Albert & (John & (Peter & (Harry & nil))))?`

There are three rules for member. The first rule applies if the sequence is an empty sequence. This is not the case in our example. The second rule applies if the first input term and the head of the sequence are equal. The third rule is applied in all other cases. Because Peter is unequal to Albert, the second rule cannot be applied; therefore the last rule of the member definition is chosen. This rule says that the member function must be called again, but now for the tail of the sequence. This results in the recursive call:

`member (Peter, John & (Peter & (Harry & nil)))?`

And because Peter is unequal to John, again the last rule applies. It results in another recursive call:

`member (Peter, Peter & (Harry & nil))?`

Now, Peter is equal to Peter, so the second member rule applies: the result of the member function is true.

Exercise:

• 1.6 Trace the execution of:
`member (Harry, Albert & (John & (Peter & nil)))?`
• 1.7 Make an iterative version of the member function.

1.6 Deleting an element from a sequence

The deletion of an element from a sequence is defined by:

```delete (element, sequence) -> sequence;
delete (X, =nil)        = nil;
delete (X, X & Tail)    = Tail;

This definition is based on the following considerations:

• If the end of a sequence is reached, then X is not a member of the sequence, so nothing is deleted.
• If X is equal to the head of a sequence then X is a member of the sequence and the tail of the sequence is also the tail of the new sequence.
• If both previous conditions are not true, try to delete X from the tail of the sequence and build a new term based on the result.

As a result, the delete definition first searches for the member of the sequence which should be deleted. If that member has been found a new sequence is built in front of the tail of that member.

Suppose we want to delete Peter from the sequence of Men. The evaluation of the delete operation is performed in the following way:

```Trio = delete (Peter, Men)
= delete (Peter, Albert & (John & (Peter & (Harry & nil))))
= Albert & delete (Peter, John & (Peter & (Harry & nil)))
= Albert & (John & delete (Peter, Peter & (Harry & nil)))
= Albert & (John & (Harry & nil))```

Note that the sequence of Men as well as the new sequence of Trio are still both valid, as is shown by the following picture (see Figure 1-3).

```                               Peter   Harry
/       /
Men  =  ----> & ----> & ----> & ----> & ----> nil
\       \             /
Albert  John         /
/       /           /
Trio =  ----> & ----> & --------- /```

Figure 1-3: Example of two related sequences

The result of the delete operation is that there are now two sequences: Men and Trio. They have a common tail, consisting of Harry & nil. In addition, they have two common members, Albert and John.

1.7 Adding an item to a sequence

Adding an item to the front of a sequence is easy. If E is the new element and S is the sequence then the resulting sequence is:

E & S

So, we actually need no new operation for adding a new element in front of a sequence. Note that the initial sequence is not disturbed by this operation: the new sequence as well as the old sequence are still both valid.

Adding an item at the end of a sequence is not so easy. It is done by the following definition:

```add (sequence, element) -> sequence;
add (=nil, X)        = X & nil;

This definition is based on the following considerations:

• At the end of the sequence a new node for X is created as the last element of the new sequence.
• If the sequence is not empty then the element X will be added at the end of the tail and the result is used to compose a new term.

As a result the add definition first searches for the end of the sequence, then creates a new end node based on the element to be added and then copies the original sequence in front of it.

The evaluation of this definition is illustrated by the following example:

```add (John & (Peter & (Harry & nil)), Albert) =
John & add (Peter & (Harry & nil), Albert)   =
John & (Peter & add (Harry & nil, Albert))   =
John & (Peter & (Harry & add (nil, Albert))  =
John & (Peter & (Harry & (Albert & nil))) ```

Note that, also here, the initial sequence is not disturbed by this operation: the new sequence as well as the old sequence are both valid.

Exercise:

• 1.8 Define a number of operations so that a sequence can be used as a stack.
• 1.9 Define a number of operations so that a sequence can be used as a queue.

1.8 Concatenating two sequences

The concatenation of two sequences is similar to adding an element at the end of the list:

```concat (sequence, sequence) -> sequence;
concat (=nil, S)  = S;
concat (H & T, S) = H & concat (T, S);```

The concat definition first searches for the end of the first sequence, and then copies the first sequence in front of the second sequence.

An example of the evaluation of this definition is as follows:

```concat ( Albert & (John & nil), Peter & (Harry & nil))
= Albert & concat ( John & nil, Peter & (Harry & nil))
= Albert & (John & concat ( nil, Peter & (Harry & nil)))
= Albert & (John & (Peter & (Harry & nil)))```

Note that, also here, the input sequences are not affected by this operation: the result sequence as well as the input sequences remain intact.

Exercise:

• 1.10 Draw a picture of the relationships between the input sequences and the output sequence. Show also the common elements.

1.9 Reversing a sequence

For reversing the order of elements in a sequence two definitions are required:

```reverse (sequence) -> sequence;
reverse (S) = shunt (S,nil);```
```shunt (input = sequence, output = sequence) -> sequence;
shunt (=nil, Output)        = Output;

The reverse definition makes use of an auxiliary definition called shunt. The shunt definition uses an input sequence and an output sequence. It moves the head of the input sequence to the tail of the output sequence. The result is an output sequence with the elements of the input sequence in reverse order.

An example of the evaluation of these definitions is as follows:

```reverse (Albert & (John & (Peter & (Harry & nil))))    =
shunt (Albert & (John & (Peter & (Harry & nil))), nil) =
shunt (John & (Peter & (Harry & nil)), Albert & nil)   =
shunt (Peter & (Harry & nil), John & (Albert & nil))   =
shunt (Harry & nil, Peter & (John & (Albert & nil)))   =
shunt (nil, Harry & (Peter & (John & (Albert & nil)))) =
Harry & (Peter & (John & (Albert & nil)))```

Also here the original input sequence is not disturbed.

1.10 Sequences and Streams

Sequences can be converted to streams. A definition which will create a stream of elements from a sequence is the following:

```Items (sequence) -> multi (element);
Items (=nil) = no (element);

This definition is based on the following considerations:

• If the end of a sequence is encountered, no element is returned.
• If the previous condition is not true the head of a sequence will be the head of the stream followed by the items of the tail.

An example of the evaluation of this definition is as follows:

```Items (Men) = Items (Albert & (John & (Peter & (Harry & nil))))
= (Albert, Items (John & (Peter & (Harry & nil))))
= (Albert, John, Items (Peter & (Harry & nil)))
= (Albert, John, Peter, Items (Harry & nil))
= (Albert, John, Peter, Harry, Items (nil))
= (Albert, John, Peter, Harry)```

Exercise:

• 1.11 Make an iterative version of the Items function.

1.11 Sequences and Lists

A sequence S can be easily converted to a bi-directional list L, by using the previous definition:

`L = { Items (S) };`

The opposite, converting a list to a sequence, is performed by the following definition:

```Sequence (list(entity1)) -> sequence;
Sequence (List) = [ T:= nil; T:= term:prioritems(List) & T; T];```

In this definition a list is first converted into a stream by means of the prioritems(L) operation and then converted to a sequence.

The working of this definition is illustrated by the following example. Suppose:

`L0 = { 12, 23, 45 }; `

During the evaluation of Sequence(L0) the successive values of T are:

```T:=nil;
T:=45 & nil;
T:=23 & (45 & nil);
T:=12 & (23 & (45 & nil));```

1.12 Sorting sequences

Sorting is the process of re-arranging the elements of a given collection in a specific order. Many sorting algorithms have been designed for arrays, lists, trees, files, and so on. In this section we will limit ourselves to some specific sorting methods for sequences. For a thorough treatment of sorting, the interested reader is referred to a number of authors, described in the References.

A sequence can only be sorted if there is an ordering relation between the elements in the list. For the purpose of this discussion we will assume that there is an ordering relation X < Y for the elements X and Y. The ordering relation X < Y may mean X less than Y, but may also have a different meaning, depending on its definition. For our examples, we will assume that such a relation is defined for integers, real, and text in the following way:

```       term < term        -> boolean;
integer: I1 < integer: I2 = I1 < I2;
real: R1 < real: R2    = R1 < R2;
text: T1 < text: T2    = T1 < T2;```

Note, that this is a partial function which can be extended with other data types for which the relation T1 < T2 is defined.

We will develop two methods of sorting, based on different ideas for sorting sequences.

The first method is based on insertion sort, which works as follows: One element, typically the first, is removed from the sequence which must be sorted. The rest of the sequence is sorted recursively. After that, the removed element is inserted in the sorted sequence, preserving the sorted order of the sequence. Two definitions are required for defining insertion sort:

```insertsort (sequence)    -> sequence;
insertsort (=nil)        = nil;
```insert (element, sequence) -> sequence;
insert (E, =nil)           = E & nil;
insert (E, Head & Tail)    = if E < Head then E & ( Head & Tail)
else Head & insert (E, Tail);```

The first definition, insertsort, removes the first element from the input sequence and sorts the tail recursively. After the tail has been sorted, the first element will be inserted in the sorted sequence by means of the second definition.

The second definition, insert, scans the sorted sequence to find the appropriate place for insertion, depending on the ordering relation.

Exercise:

• 1.12 The working of insertsort is best demonstrated by an example. Evaluate the following call:
`insertsort( 34 & (67 & (45 & ( 23 & nil))))`

The average time that insertsort requires for sorting a sequence of length n grows proportionally to n * n. For long sequences a better sorting method is provided by another approach.

This second method is based on quick sort, which works as follows: One element, called the pivot, is removed from the sequence which must be sorted. The rest of the sequence is divided into two sub-sequences. All elements of the first sub-sequence are less than the pivot, all elements of the second sub-sequence are greater than or equal to the pivot. These two sub-sequences are sorted recursively. Finally, the sorted sub-sequences and the pivot are concatenated in the proper order.

The definition of quick sort is:

```quicksort (sequence) -> sequence;
quicksort (=nil) = nil;
quicksort (Pivot & Tail) =
[ Smaller:=nil; Larger:=nil;
[ E = Items(Tail);
if E < Pivot then Smaller:=E & Smaller
else Larger :=E & Larger ];
concat( quicksort (Smaller), Pivot & quicksort (Larger)) ];```

The second rule of the quicksort definition decomposes a sequence into a Head and a Tail. The Head is used as the pivot.

The definition body starts with setting up the two sub-sequences, one for the elements which are Smaller than the pivot, the other for the elements which are equal to or Larger than the pivot.

The next step is to read all elements of the Tail with the help of the Items definition as defined earlier. Each element is added to the Smaller or to the Larger sub-sequence, depending on whether it is smaller or larger than the pivot.

After all elements of the Tail have been processed, the Smaller and the Larger sub-sequences are sorted. Finally, the resulting sequences and the pivot are concatenated into a resulting sequence with the help of the concat operation as defined earlier.

Exercise:

• 1.13 Evaluate the following call:
`quicksort ( 34 & (67 & (45 & ( 23 & nil))))`

The average time that quicksort requires for sorting a sequence of length n is of the order n log n if the two sub-sequences are approximately of equal lengths. However, if splitting always results in one sub-sequence far bigger than the other, then the time needed is in the order of n * n.

1.13 A Component for Sequences

Based on the preceding functions and operations we can make a selection for a component for sequences (see Figure 1-4).

```component Sequences;
type element = term;
type sequence = term;```
`  entity & sequence => sequence;`
```  head (sequence)            -> element;
tail (sequence)            -> sequence;
length (sequence)          -> integer;
member (element, sequence) -> boolean;
delete (element, sequence) -> sequence;
concat (sequence, sequence)-> sequence;
reverse (sequence)         -> sequence;
Items (sequence)           -> multi (element);
Sequence (list (entity1))  -> sequence;
begin
nil = null (sequence);```
`  head ( X & Y) = X;`
`  tail ( X & Y) = Y;`
`  length (S) = if S == nil then 0 else 1 + length (tail (S));`
```  member (X, =nil)        = false;
member (X, X & Tail)    = true;
member (X, Head & Tail) = member (X, Tail);```
```  delete (X, =nil)        = nil;
delete (X, X & Tail)    = Tail;
```  add (=nil, X)        = X & nil;
```  concat (=nil, S)  = S;
concat (H & T, S) = H & concat (T,S);```
`  reverse (S) = shunt (S, nil);`
```  shunt (input = sequence, output = sequence) -> sequence;
shunt (=nil, Output)        = Output;
```  Items (=nil)         = no (element);
`  Sequence (L) = [ T := nil; T:= term:prioritems(L) & T; T];`
`end component Sequences;`

Figure 1-4: A component for sequences.

Note that the term-signature:

`element & sequence => sequence;`

has been replaced by the more general:

`entity1 & sequence => sequence;`

All other definitions are the same.

1.14 Summary

• Sequences are uni-directional lists.
• In uni-directional lists every element, except the last one, has a successor.
• In bi-directional lists every element, except the first and the last ones, has a successor and a predecessor.
• A sequence can be represented as a binary tree of terms. That means that all the properties and operations defined on terms are also applicable to sequences.
• A sequence consists of a number of nodes. The functor of each node is the & operator. One branch of a node refers to an element of the sequence. The other branch refers to the next node of the sequence. The last node of the sequence is a leaf node represented by the nil value.
• Elements of a given sequence may all be of the same type or of different types.
• Several kinds of operations can be defined on sequences. Examples are given of head and tail operations, adding and deleting elements, sorting, and so on.
• Functions defined on list may be partial or total functions.
• Operations on sequences may use recursion or iteration for processing a sequence.
• Pattern matching and recursion often lead to succinct and compact definitions.
• Sequences can be converted to streams, and visa versa.
• Sequences can be converted to lists, and visa versa.
• Two sorting methods, insertion sort and quick sort, have been discussed.

1.15 References

Many functional languages and logical languages are using uni-directional lists.

Their use is described for Lisp by McCarthy(1960), Winston and Horn(1984), Steel(1984), and many others.

For other functional languages by Henderson(1980), by Darlington, Henderson, and Turner (1982), and by Bird and Wadler (1988).

For Prolog their use is described by Bratko (1990), Clocksin and Mellish (1984), Sterling and Shapiro (1986), and many others.

Sorting algorithms have been discussed by Knuth(1975), Wirth(1976), Ado, Hopcroft and Ullman(1974, 1983), Baase(1978), and Gonnet(1984). Part 2: Data Structures Chapter 1: Sequences