|
1
SEQUENCES
1.1 What are
sequences?
1.2
Representation of Sequences
1.3 Head and
tail operations
1.4 The
length of a sequence
1.5 Membership
1.6
Deleting an element from a sequence
1.7
Adding an item to a sequence
1.8
Concatenating two sequences
1.9 Reversing a
sequence
1.10 Sequences
and Streams
1.11 Sequences
and Lists
1.12 Sorting
sequences
1.13 A
Component for Sequences
1.14 Summary
1.15 References
In the preceding part the basic language
facilities have been covered. Based on this knowledge we are now
able to examine examples of specific applications and to discuss
advanced programming techniques.
In this chapter we will start with a discussion
about sequences.
1.1 What
are sequences?
Sequences are uni-directional lists.
Uni-directional lists are widely used in many functional and
logical programming languages, because of their interesting
properties.
We will use the term sequence to
distinguish uni-directional lists from the bi-directional
lists, described in Chapter 8. For both kinds of lists, each
element, except the last one, has a successor. That means that it
is easy to go from one element to the next. In a bi-directional
list the predecessor operation is also defined: that makes it
equally simple to move from an element to its preceding element.
However, for a uni-directional list the predecessor function is
not available: the list can only be traversed in forward
direction.
These properties are not only important for the
ways a list can be traversed, but also for other operations. For
example, adding an element to the front of a list is for both
kinds of lists easy, but adding an element to the end of a
uni-directional list is not so easy and, furthermore, it is
time-consuming, while it is simple and efficient for a
bi-directional list. The same applies to insertion and deletion
of an element somewhere in the list.
On the other hand, uni-directional lists have a
number of interesting properties which make them suitable for
certain tasks. In many cases, definitions of complex algorithms
using uni-directional lists are often very short and compact.
Another property is that many operations on uni-directional lists
are mathematical functions. They are not changing the input lists
nor have other side-effects; they treat sequences as immutable
objects. The result of such a function is determined solely by
the values of its arguments, so, a function called with the same
arguments will always yield the same result. This makes reasoning
about programs easier.
It depends on the application which of the two
kinds of lists should be used. In some cases, we may even need a
combination of both kinds of lists. As usual, it is always a
question of trade-off where to apply what.
1.2
Representation of Sequences
A sequence is an ordered set of elements. A
sequence can be represented as a binary tree of terms. A sequence
itself is also a term, which may be composed of other terms, and
so on. That means that all the properties and operations defined
on terms are also applicable to sequences. Sequences are in fact
made of a special selection of terms.
All elements of a given sequence may have the
same type: one may have a sequence of integers, a sequence of
reals, a sequence of symbols, a sequence of sequences, or a
sequence of any other type of data items. The elements of a
sequence may also be of different types. For simplicity, we will
assume that elements are terms, although other types are also
possible.
We start with a declaration that the elements
of a sequence and the sequence itself are all terms. That means:
type element = term;
type sequence = term;
The nodes of a sequence are represented as
terms defined by the following term-signature:
element & sequence => sequence;
The end of a sequence is marked by a nil
value. The constant nil is defined as follows:
nil = null (sequence);
The null (sequence) expression
represents a null value of the given type. A null value
means that there is no reference to an object.
This is all we need to define sequences. Note
that we used only existing language elements.
Let us now use an example. We assume that the
following symbols are defined:
Albert, John, Peter, Harry => element;
We can now create a sequence in the following
way:
Men = Albert & (John & (Peter & (Harry & nil)));
The reason parentheses are used is because we
need a specific evaluation order to obtain a sequence. This
becomes clear if we represent the result as a tree (see Figure
1-1).
Harry nil
\ /
Peter &
\ /
John &
\ /
Albert &
\ /
&
/
Figure 1-1: Example of a sequence
tree
This is a normal representation of a term tree
as we have seen before. If we are now turning the tree 45 degrees
clockwise and are also showing the direction of the branches we
get the following picture (see Figure 1-2). The result is a
sequence of names in a uni-directional list.
Albert John Peter Harry
/ / / /
----> & ----> & ----> & ----> & ----> nil
Figure 1-2: Example of a sequence
In general, a sequence consists of a number of
nodes. The functor of each node is the & operator. One
branch of a node refers to an element of the sequence. The other
branch refers to the next node of the sequence. The last node of
the sequence is a leaf node represented by the nil value.
After having described how sequences are
represented we may now define some operations on them.
1.3
Head and tail operations
The first element of a sequence is also called
the head of a sequence. To obtain the head of a sequence
the following definition can be used:
head (sequence) -> element;
head (X & Y) = X;
Suppose we are applying this to our sequence of
Men:
head (Men)?
This is equivalent to:
head (Albert & (John & (Peter & (Harry & nil))))?
and because of the definition of head
this query will return:
Albert
The head definition in its current form
is only defined for non-empty sequences. It is undefined for
empty sequences and for other kinds of terms. A function which is
not defined for all possible arguments, is called a partial
function. A function which is defined for all the possible
arguments, is called a total function.
In general, partial functions will fail if they
are executed with undefined argument values. In Section 13.4,
"Pattern Matching", we discussed several examples of
partial functions and we used the optional quantifier to
cope with undefined arguments. In this chapter, we will use the
implicit single quantifier for partial functions to show
that such an approach is perfectly usable, as long as the users
of such a function are aware of its limitations.
Exercise:
- 1.1 Are the following functions
partial or total functions?
abs (real) -> real;
sqrt (real)-> real;
cos (real) -> real;
ln (real) -> real;
- 1.2 Define a total function for head.
Another interesting operation on sequences is
to obtain the tail of a sequence. This is defined for
non-empty sequences as:
tail (sequence) -> sequence;
tail (X & Y) = Y;
The tail function is also a partial
function. We use this function in the following example:
tail (Men)?
It results is:
John & (Peter & (Harry & nil));
With a combination of head and tail
operations it is possible to obtain also the second, third, etc.,
element of a sequence. For example, the second element of a
sequence S is obtained by:
head (tail (S))
Exercise:
- 1.3 Give an expression which
obtains the fourth element of a sequence. Apply this to
the example of Men and verify the answer.
In general, the following equation holds for a
non-empty sequence S:
S = head (S) & tail (S)
This equation describes the relationship
between the three operations we discussed so far. It says that
the head of a sequence S concatenated with the tail of the
sequence S is equal to the original sequence S.
1.4
The length of a sequence
Sometimes it is useful to know how many
elements there are in a sequence. Therefore we will define a length
function which counts the number of elements. Although it is a
simple function, it has a number of characteristics in common
with many other functions defined on sequences. Because of that
we will discuss three versions of the length function in detail.
One method to find the length of a sequence is
given by the following description:
- if a sequence is empty its length is zero.
- if a sequence is not empty then its length
is one plus the length of its tail.
This kind of formulation is typical for many
operations on sequences. It consists of (at least) two parts. The
first part says what the operation means if a sequence is empty.
The second part defines what the meaning of the operation for the
head and the tail of a sequence. Often, the same operation is
recursively applied to the tail of the sequence.
Such a formulation can immediately be
translated into a corresponding definition. In this case it
becomes:
length (sequence) -> integer;
length (S) = if S == nil then 0 else 1 + length (tail (S));
If we apply this to our example of Men
we get the following evaluation:
length (Men) = length (Albert & (John & (Peter & (Harry & nil))))
= 1 + length (John & (Peter & (Harry & nil)))
= 1 + (1 + length (Peter & (Harry & nil)))
= 1 + (1 + (1 + length (Harry & nil)))
= 1 + (1 + (1 + (1 + length (nil))))
= 1 + (1 + (1 + (1 + 0)))
= 1 + (1 + (1 + 1))
= 1 + (1 + 2)
= 1 + 3
= 4
During the evaluation of such a recursive
definition the real computation is postponed until the end of the
sequence has been encountered, which is similar to what we have
seen earlier with the recursive evaluation of the factorial
function in "Language Description", Chapter 3.
Our length definition can also be written in
another way if we make use of pattern matching:
length (sequence) -> integer;
length (=nil) = 0;
length (Head & Tail) = 1 + length (Tail);
This definition is equivalent to the first
definition. The length of a sequence is determined by both
definitions in exactly the same way. The difference is only a
question of notation.
Note that the ordering of rules is important in
this version. The order of execution should be the same as in the
first version: first a test if the sequence is an empty sequence
before pattern matching may occur.
Exercise:
- 1.4 Is the second version of length
a partial or a total function?
A recursive formulation of a definition often
leads to a succinct and simple description. However, depending on
the implementation, the postponed evaluation of the computation
will often require a certain price in terms of memory occupation
for the intermediate and unevaluated results, and will also
require additional execution time for the recursive calls of the
definition.
These kinds of considerations often lead to
another approach for these kinds of problems. Instead of a
recursive formulation an iterative method is used, where
the necessary computation is immediately performed. An iterative
solution for our length operation is given in the following:
length (sequence) -> integer;
length (S) = [ counter:=0; Tail:=S;
repeat [ if Tail == nil then ready;
counter:=counter + 1;
Tail:=argument (Tail, 2) ];
counter ];
This definition contains a loop which starts at
the beginning of a sequence and maintains a running counter of
the number of elements until the end of the sequence it met.
Exercise:
- 1.5 Is the third version of length
a partial or a total function?
In following examples of operations on
sequences we will mainly use the second approach of formulating a
solution because it better reflects in many cases the verbal
description how to solve a given problem.
1.5 Membership
Another interesting definition is the member
function. This function tests if an argument is an element of a
sequence. Its definition is:
member (element, sequence) -> boolean;
member (X, =nil) = false;
member (X, X & Tail) = true;
member (X, Head & Tail) = member (X, Tail);
This definition is based on the following
considerations:
- If the end of a sequence is reached, then
X is not a member of the sequence.
- If X is equal to the head of a sequence
then X is a member of the sequence.
- If both previous conditions are not true,
search the tail of the sequence for membership.
Note that also here the order of the definition
rules is important!
Let us try to use this definition in the
following example:
member (Peter, Men)?
which is equivalent to:
member (Peter, Albert & (John & (Peter & (Harry & nil))))?
There are three rules for member. The first
rule applies if the sequence is an empty sequence. This is not
the case in our example. The second rule applies if the first
input term and the head of the sequence are equal. The third rule
is applied in all other cases. Because Peter is unequal to
Albert, the second rule cannot be applied; therefore the last
rule of the member definition is chosen. This rule says that the member
function must be called again, but now for the tail of the
sequence. This results in the recursive call:
member (Peter, John & (Peter & (Harry & nil)))?
And because Peter is unequal to John, again the
last rule applies. It results in another recursive call:
member (Peter, Peter & (Harry & nil))?
Now, Peter is equal to Peter, so the second
member rule applies: the result of the member function is true.
Exercise:
- 1.6 Trace the execution of:
member (Harry, Albert & (John & (Peter & nil)))?
- 1.7 Make an iterative version of
the member function.
1.6 Deleting an element from a sequence
The deletion of an element from a sequence is
defined by:
delete (element, sequence) -> sequence;
delete (X, =nil) = nil;
delete (X, X & Tail) = Tail;
delete (X, Head & Tail) = Head & delete (X, Tail);
This definition is based on the following
considerations:
- If the end of a sequence is reached, then
X is not a member of the sequence, so nothing is deleted.
- If X is equal to the head of a sequence
then X is a member of the sequence and the tail of the
sequence is also the tail of the new sequence.
- If both previous conditions are not true,
try to delete X from the tail of the sequence and build a
new term based on the result.
As a result, the delete definition first
searches for the member of the sequence which should be deleted.
If that member has been found a new sequence is built in front of
the tail of that member.
Suppose we want to delete Peter from the
sequence of Men. The evaluation of the delete
operation is performed in the following way:
Trio = delete (Peter, Men)
= delete (Peter, Albert & (John & (Peter & (Harry & nil))))
= Albert & delete (Peter, John & (Peter & (Harry & nil)))
= Albert & (John & delete (Peter, Peter & (Harry & nil)))
= Albert & (John & (Harry & nil))
Note that the sequence of Men as well as
the new sequence of Trio are still both valid, as is shown
by the following picture (see Figure 1-3).
Peter Harry
/ /
Men = ----> & ----> & ----> & ----> & ----> nil
\ \ /
Albert John /
/ / /
Trio = ----> & ----> & --------- /
Figure 1-3: Example of two related
sequences
The result of the delete operation is that
there are now two sequences: Men and Trio. They
have a common tail, consisting of Harry & nil. In
addition, they have two common members, Albert and John.
1.7
Adding an item to a sequence
Adding an item to the front of a sequence is
easy. If E is the new element and S is the sequence then the
resulting sequence is:
E & S
So, we actually need no new operation for
adding a new element in front of a sequence. Note that the
initial sequence is not disturbed by this operation: the new
sequence as well as the old sequence are still both valid.
Adding an item at the end of a sequence is not
so easy. It is done by the following definition:
add (sequence, element) -> sequence;
add (=nil, X) = X & nil;
add (Head & Tail, X) = Head & add (Tail, X);
This definition is based on the following
considerations:
- At the end of the sequence a new node for
X is created as the last element of the new sequence.
- If the sequence is not empty then the
element X will be added at the end of the tail and the
result is used to compose a new term.
As a result the add definition first
searches for the end of the sequence, then creates a new end node
based on the element to be added and then copies the original
sequence in front of it.
The evaluation of this definition is
illustrated by the following example:
add (John & (Peter & (Harry & nil)), Albert) =
John & add (Peter & (Harry & nil), Albert) =
John & (Peter & add (Harry & nil, Albert)) =
John & (Peter & (Harry & add (nil, Albert)) =
John & (Peter & (Harry & (Albert & nil)))
Note that, also here, the initial sequence is
not disturbed by this operation: the new sequence as well as the
old sequence are both valid.
Exercise:
- 1.8 Define a number of operations
so that a sequence can be used as a stack.
- 1.9 Define a number of operations
so that a sequence can be used as a queue.
1.8
Concatenating two sequences
The concatenation of two sequences is similar
to adding an element at the end of the list:
concat (sequence, sequence) -> sequence;
concat (=nil, S) = S;
concat (H & T, S) = H & concat (T, S);
The concat definition first searches for
the end of the first sequence, and then copies the first sequence
in front of the second sequence.
An example of the evaluation of this definition
is as follows:
concat ( Albert & (John & nil), Peter & (Harry & nil))
= Albert & concat ( John & nil, Peter & (Harry & nil))
= Albert & (John & concat ( nil, Peter & (Harry & nil)))
= Albert & (John & (Peter & (Harry & nil)))
Note that, also here, the input sequences are
not affected by this operation: the result sequence as well as
the input sequences remain intact.
Exercise:
- 1.10 Draw a picture of the
relationships between the input sequences and the output
sequence. Show also the common elements.
1.9
Reversing a sequence
For reversing the order of elements in a
sequence two definitions are required:
reverse (sequence) -> sequence;
reverse (S) = shunt (S,nil); shunt (input = sequence, output = sequence) -> sequence;
shunt (=nil, Output) = Output;
shunt (Head & Tail, Output) = shunt (Tail, Head & Output);
The reverse definition makes use of an
auxiliary definition called shunt. The shunt definition
uses an input sequence and an output sequence. It moves the head
of the input sequence to the tail of the output sequence. The
result is an output sequence with the elements of the input
sequence in reverse order.
An example of the evaluation of these
definitions is as follows:
reverse (Albert & (John & (Peter & (Harry & nil)))) =
shunt (Albert & (John & (Peter & (Harry & nil))), nil) =
shunt (John & (Peter & (Harry & nil)), Albert & nil) =
shunt (Peter & (Harry & nil), John & (Albert & nil)) =
shunt (Harry & nil, Peter & (John & (Albert & nil))) =
shunt (nil, Harry & (Peter & (John & (Albert & nil)))) =
Harry & (Peter & (John & (Albert & nil)))
Also here the original input sequence is not
disturbed.
1.10
Sequences and Streams
Sequences can be converted to streams. A
definition which will create a stream of elements from a sequence
is the following:
Items (sequence) -> multi (element);
Items (=nil) = no (element);
Items ( Head & Tail) = ( Head, Items (Tail) );
This definition is based on the following
considerations:
- If the end of a sequence is encountered,
no element is returned.
- If the previous condition is not true the
head of a sequence will be the head of the stream
followed by the items of the tail.
An example of the evaluation of this definition
is as follows:
Items (Men) = Items (Albert & (John & (Peter & (Harry & nil))))
= (Albert, Items (John & (Peter & (Harry & nil))))
= (Albert, John, Items (Peter & (Harry & nil)))
= (Albert, John, Peter, Items (Harry & nil))
= (Albert, John, Peter, Harry, Items (nil))
= (Albert, John, Peter, Harry)
Exercise:
- 1.11 Make an iterative version of
the Items function.
1.11
Sequences and Lists
A sequence S can be easily converted to a
bi-directional list L, by using the previous definition:
The opposite, converting a list to a sequence,
is performed by the following definition:
Sequence (list(entity1)) -> sequence;
Sequence (List) = [ T:= nil; T:= term:prioritems(List) & T; T];
In this definition a list is first converted
into a stream by means of the prioritems(L) operation and
then converted to a sequence.
The working of this definition is illustrated
by the following example. Suppose:
During the evaluation of Sequence(L0)
the successive values of T are:
T:=nil;
T:=45 & nil;
T:=23 & (45 & nil);
T:=12 & (23 & (45 & nil));
1.12
Sorting sequences
Sorting is the process of re-arranging the
elements of a given collection in a specific order. Many sorting
algorithms have been designed for arrays, lists, trees, files,
and so on. In this section we will limit ourselves to some
specific sorting methods for sequences. For a thorough treatment
of sorting, the interested reader is referred to a number of
authors, described in the References.
A sequence can only be sorted if there is an
ordering relation between the elements in the list. For the
purpose of this discussion we will assume that there is an
ordering relation X < Y for the elements X and Y. The ordering
relation X < Y may mean X less than Y, but may also
have a different meaning, depending on its definition. For our
examples, we will assume that such a relation is defined for
integers, real, and text in the following way:
term < term -> boolean;
integer: I1 < integer: I2 = I1 < I2;
real: R1 < real: R2 = R1 < R2;
text: T1 < text: T2 = T1 < T2;
Note, that this is a partial function which can
be extended with other data types for which the relation T1 <
T2 is defined.
We will develop two methods of sorting, based
on different ideas for sorting sequences.
The first method is based on insertion sort,
which works as follows: One element, typically the first, is
removed from the sequence which must be sorted. The rest of the
sequence is sorted recursively. After that, the removed element
is inserted in the sorted sequence, preserving the sorted order
of the sequence. Two definitions are required for defining
insertion sort:
insertsort (sequence) -> sequence;
insertsort (=nil) = nil;
insertsort (Head & Tail) = insert (Head, insertsort (Tail)); insert (element, sequence) -> sequence;
insert (E, =nil) = E & nil;
insert (E, Head & Tail) = if E < Head then E & ( Head & Tail)
else Head & insert (E, Tail);
The first definition, insertsort,
removes the first element from the input sequence and sorts the
tail recursively. After the tail has been sorted, the first
element will be inserted in the sorted sequence by means of the
second definition.
The second definition, insert, scans the
sorted sequence to find the appropriate place for insertion,
depending on the ordering relation.
Exercise:
- 1.12 The working of insertsort is
best demonstrated by an example. Evaluate the following
call:
insertsort( 34 & (67 & (45 & ( 23 & nil))))
The average time that insertsort
requires for sorting a sequence of length n grows proportionally
to n * n. For long sequences a better sorting method is provided
by another approach.
This second method is based on quick sort,
which works as follows: One element, called the pivot, is
removed from the sequence which must be sorted. The rest of the
sequence is divided into two sub-sequences. All elements of the
first sub-sequence are less than the pivot, all elements of the
second sub-sequence are greater than or equal to the pivot. These
two sub-sequences are sorted recursively. Finally, the sorted
sub-sequences and the pivot are concatenated in the proper order.
The definition of quick sort is:
quicksort (sequence) -> sequence;
quicksort (=nil) = nil;
quicksort (Pivot & Tail) =
[ Smaller:=nil; Larger:=nil;
[ E = Items(Tail);
if E < Pivot then Smaller:=E & Smaller
else Larger :=E & Larger ];
concat( quicksort (Smaller), Pivot & quicksort (Larger)) ];
The second rule of the quicksort definition
decomposes a sequence into a Head and a Tail. The Head
is used as the pivot.
The definition body starts with setting up the
two sub-sequences, one for the elements which are Smaller
than the pivot, the other for the elements which are equal to or Larger
than the pivot.
The next step is to read all elements of the Tail
with the help of the Items definition as defined earlier.
Each element is added to the Smaller or to the Larger
sub-sequence, depending on whether it is smaller or larger than
the pivot.
After all elements of the Tail have been
processed, the Smaller and the Larger sub-sequences
are sorted. Finally, the resulting sequences and the pivot are
concatenated into a resulting sequence with the help of the concat
operation as defined earlier.
Exercise:
- 1.13 Evaluate the following call:
quicksort ( 34 & (67 & (45 & ( 23 & nil))))
The average time that quicksort requires
for sorting a sequence of length n is of the order n log n if the
two sub-sequences are approximately of equal lengths. However, if
splitting always results in one sub-sequence far bigger than the
other, then the time needed is in the order of n * n.
1.13
A Component for Sequences
Based on the preceding functions and operations
we can make a selection for a component for sequences (see Figure
1-4).
component Sequences;
type element = term;
type sequence = term; entity & sequence => sequence; head (sequence) -> element;
tail (sequence) -> sequence;
length (sequence) -> integer;
member (element, sequence) -> boolean;
delete (element, sequence) -> sequence;
add (sequence, element) -> sequence;
concat (sequence, sequence)-> sequence;
reverse (sequence) -> sequence;
Items (sequence) -> multi (element);
Sequence (list (entity1)) -> sequence;
begin
nil = null (sequence); head ( X & Y) = X; tail ( X & Y) = Y; length (S) = if S == nil then 0 else 1 + length (tail (S)); member (X, =nil) = false;
member (X, X & Tail) = true;
member (X, Head & Tail) = member (X, Tail); delete (X, =nil) = nil;
delete (X, X & Tail) = Tail;
delete (X, Head & Tail) = Head & delete (X, Tail); add (=nil, X) = X & nil;
add (Head & Tail, X) = Head & add (Tail, X); concat (=nil, S) = S;
concat (H & T, S) = H & concat (T,S); reverse (S) = shunt (S, nil); shunt (input = sequence, output = sequence) -> sequence;
shunt (=nil, Output) = Output;
shunt (Head & Tail, Output) = shunt (Tail, Head & Output); Items (=nil) = no (element);
Items ( Head & Tail) = ( Head, Items (Tail) ); Sequence (L) = [ T := nil; T:= term:prioritems(L) & T; T]; end component Sequences;
Figure 1-4: A component
for sequences.
Note that the term-signature:
element & sequence => sequence;
has been replaced by the more general:
entity1 & sequence => sequence;
All other definitions are the same.
1.14 Summary
- Sequences are uni-directional lists.
- In uni-directional lists every element,
except the last one, has a successor.
- In bi-directional lists every element,
except the first and the last ones, has a successor and a
predecessor.
- A sequence can be represented as a binary
tree of terms. That means that all the properties and
operations defined on terms are also applicable to
sequences.
- A sequence consists of a number of nodes.
The functor of each node is the & operator.
One branch of a node refers to an element of the
sequence. The other branch refers to the next node of the
sequence. The last node of the sequence is a leaf node
represented by the nil value.
- Elements of a given sequence may all be of
the same type or of different types.
- Several kinds of operations can be defined
on sequences. Examples are given of head and tail
operations, adding and deleting elements, sorting, and so
on.
- Functions defined on list may be partial
or total functions.
- Operations on sequences may use recursion
or iteration for processing a sequence.
- Pattern matching and recursion often lead
to succinct and compact definitions.
- Sequences can be converted to streams, and
visa versa.
- Sequences can be converted to lists, and
visa versa.
- Two sorting methods, insertion sort and
quick sort, have been discussed.
1.15 References
Many functional languages and logical languages
are using uni-directional lists.
Their use is described for Lisp by
McCarthy(1960), Winston and Horn(1984), Steel(1984), and many
others.
For other functional languages by
Henderson(1980), by Darlington, Henderson, and Turner (1982), and
by Bird and Wadler (1988).
For Prolog their use is described by Bratko
(1990), Clocksin and Mellish (1984), Sterling and Shapiro (1986),
and many others.
Sorting algorithms have been discussed by
Knuth(1975), Wirth(1976), Ado, Hopcroft and Ullman(1974, 1983),
Baase(1978), and Gonnet(1984).
 |
|
Part 2: Data Structures
|
|
Chapter 1: Sequences |
|