Data Structures 1. Multi-value Functions 2. Lists and Streams 3. Descriptors 4. Trees

2 BASIC DATA TYPES and EXPRESSIONS

 2.1 Basic Data Types 2.1.1 Integer Type 2.1.2 Real Type 2.1.3 Boolean Type 2.1.4 Character Type
 2.2 Expressions 2.2.1 Regular Expressions 2.2.2 Operators 2.2.3 Value Conversions

Programming in Elisa consists of building definitions. Definitions are the basic building blocks of a program. Related definitions may be grouped together in software components. The primary task of a programmer is to construct definitions and to assemble them in components.

Expressions are important constituents of definitions. They are also used in queries, as will be illustrated in the following section. This chapter starts with some examples of expressions and definitions as used in sessions. Then an overview is given of the structure of the language, which is followed by a description of the lexical elements and the basic data types. The final part of this chapter is devoted to regular expressions.

2.1 Basic Data Types

The basic data types are predefined in the language. There are four basic data types: boolean, integer, real, and character. Each type has its own set of associated operations and values, as will be discussed in the following.

2.1.1 Integer Type

The integer type is a basic data type which represents integers values. The integers are whole numbers which may be positive or negative. 0 is an integer, as are 789 and -34. The allowed values are a subset of all the possible integer numbers. This subset depends on the implementation.

The following integer operations and functions are predefined in the language:

`type integer;`
```integer == integer   -> boolean;
integer <> integer   -> boolean;
integer <  integer   -> boolean;
integer <= integer   -> boolean;
integer >  integer   -> boolean;
integer >= integer   -> boolean;
integer +  integer   -> integer;
integer -  integer   -> integer;
integer *  integer   -> integer;
integer /  integer   -> integer;
integer ** integer   -> integer;
+ integer   -> integer;
- integer   -> integer;
abs(integer)         -> integer;
mod(integer,integer) -> integer;```

The infix operations +, -, *, /, ** are defined for integer operands. They represent the familiar operations of addition, subtraction, multiplication, division and exponentiation. The result is in all cases again an integer number, except when the result is not defined. For example, divide by zero is undefined and is therefore illegal.

The operations + and - may also be used as prefix operations as in +234 or -28.

The infix operations ==, <>, <, <=, >, >= are likewise defined for integer operands. They are used for comparison and they represent the familiar operations of equal to, not equal to, less than, less than or equal to, greater than, and greater than or equal to. The result is a boolean value. For example, the result of the expression " 3 < 5 " will be true.

In addition, two functions are defined on integer numbers, the abs function and the mod function. Both functions will return an integer number:

• The abs (I) function computes the absolute value of I. For example, abs(-5) will return 5.
• The mod (I, J) function computes the remainder of the division I/J. For example, the result of mod (9, 5) will be 4. The function is illegal if J is zero.

We have now a set of operations and functions defined on integer values which can be used in different kinds of expressions.

2.1.2 Real Type

The real type is a basic data type representing real numeric values. The values are a subset of the real numbers. Real values may be positive or negative. Real values are written with a decimal point and an optional exponent part, as we have seen with real literals. So, 0.0 is a real value, as are 3.14159, -45.67 and 1.0E-5.

The following real operations and functions are predefined in the language:

`type real;`
```real == real    -> boolean;
real <> real    -> boolean;
real <  real    -> boolean;
real <= real    -> boolean;
real >  real    -> boolean;
real >= real    -> boolean;
real +  real    -> real;
real -  real    -> real;
real *  real    -> real;
real /  real    -> real;
real ** integer -> real;
+ real    -> real;
- real    -> real;
abs(real)       -> real;
sqrt(real)      -> real;
real(integer)   -> real;
integer(real)   -> integer;```

The infix operations +, -, *, / are defined for real operands. They represent the familiar operations of addition, subtraction, multiplication, and division. The result is in all cases a real number, except when the result is not defined.

The exponentiation operation ** is only defined for integer exponents as in " 5.0 ** 3 " and " X ** -4 ".

The operations + and - may also be used as prefix operations as in +2.5 or -7.93.

The infix operations ==, <>, <, <=, >, >= are likewise defined for real operands. They are used for comparison and they represent the familiar operations of equal to, not equal to, less than, less than or equal to, greater than, and greater than or equal to. The result is a boolean value. For example, the result of the expression " 3.5 < 3.9 " will be true.

In addition, a number of functions have been defined for reals. They all return real numbers, with one exception, the integer conversion function:

• the abs(X) function computes the absolute value of X.
• the sqrt(X) function computes the square root value of X, if X is not negative. It is an error if X is a negative value.
• the real (I) function converts the integer I to the corresponding real value. For example, real(47) returns the real value 47.0.
• the integer (X) function returns the integer part of X. For example, integer(23.79) returns the integer value 23.

With these set of operations and functions defined for real values we can express computations on real values.

2.1.3 Boolean Type

The boolean type is a basic data type which represents boolean values. A boolean entity can only have one of two possible values: false and true.

There are five boolean operators predefined in the language:

`type boolean;`
```         ~ boolean -> boolean;
boolean  & boolean -> boolean;
boolean  | boolean -> boolean;
boolean == boolean -> boolean;
boolean <> boolean -> boolean;```

The operations ~, &, |, ==, and <> have the following meanings:

The ~ operation is the not operation. It has one boolean operand. If the operand is true then the result is false; if the operand is false the result is true.

Let us assume that we want to record some weather conditions in two boolean variables, called Sunny and Warm. For example, the boolean variable Sunny may be true or false. So, the expression "~ Sunny" will be true if Sunny is false, and vice versa.

The & operation is the and operation. It has two boolean operands. If both operands are true then the result is also true, otherwise the result is false. For example, the expression "Sunny & Warm" will be true if Sunny and Warm are both true.

The | operation is the or operation. It has also two boolean operands. If both operands are false then the result is also false, otherwise the result is true. For example, the expression "Sunny | Warm" will be true if Sunny is true or Warm is true or both are true.

The == operation is a comparison operation with two boolean operands. If the operands are equal then the result is true, otherwise the result is false. For example, the expression "Sunny == Warm" will be true if Sunny and Warm are both true, or Sunny and Warm are both false.

The <> operation is also a comparison operation with two boolean operands. If the operands are not equal then the result is true, otherwise the result is false. For example, the expression "Sunny <> Warm" will be true if Sunny is true and Warm is false, or Sunny is false and Warm is true.

With these set of operations defined for boolean values we may evaluate boolean expressions.

2.1.4 Character Type

The character type is a basic data type representing character values. Allowed values are the characters of the basic character set. The standard ASCII character set is a subset of the basic character set. The following character operations and functions are predefined in the language:

`type character;`
```character == character    -> boolean;
character <> character    -> boolean;
character <  character    -> boolean;
character <= character    -> boolean;
character >  character    -> boolean;
character >= character    -> boolean;
character(integer)	  -> character;
integer(character)	  -> integer;```

The infix operations ==, <>, <, <=, >, >= are defined for character operands. They are used for comparison and they represent the familiar operations of equal to, not equal to, less than, less than or equal to, greater than, and greater than or equal to. The result is a boolean value. For example, the result of the expression " 'A' < 'B' " will be true.

In addition, two conversion functions have been defined for characters:

• The character (I) function converts the integer value I to the corresponding character value. For example, character(66) returns the character value 'B'.
• The integer (C) function returns the integer value of the character C. For example, integer('A') returns the integer value 65.

With these set of operations and functions defined for character values we may evaluate character expressions.

2.2 Expressions

In the preceding section we learned which operations are defined for booleans, integers, characters, and characters and how we can use them in simple expressions. However, we did not discuss how these simple expressions could be combined to form compound expressions. That is the subject of this section.

Expressions are language constructs for computing values. Expressions are built from lexical elements such as identifiers, literals, and delimiters. However, not all combinations of lexical elements are valid expressions.

In general, there are two criteria which determine if a language construct is a valid construct. First, are the lexical elements of the construct written in the right order, or, in other words, is it written according to its syntax.

If that is the case then the second criterion applies: has the construct a semantic meaning in the language.

Based on their syntax there are two kinds of expressions: regular expressions and irregular expressions, also called special expressions.

In this section we mainly describe the syntactic aspects of regular expressions; the semantic aspects will be discussed in following chapters. Special expressions are discussed in Chapter 6.

2.2.1 Regular Expressions

In mathematics we are used to writing expressions like

3 * a + b * c

where + and * are operators, and 3, a, b, and c are operands. In particular, + and * are said to be infix operators because they appear between the two operands.

In an expression also prefix operations may be used, as in

- 2 * p + q

where the - operator is a prefix operator because it appears before an operand. The + and - oparators may be used both as prefix operators as well as infix operators, depending on the number of operands involved.

Parentheses may be used to specify the order of evaluation, as in

(a + b) * (c - d)

which says that the + operator and the - operator should be applied before the * operator can be applied.

There are only a limited number of syntactic rules for building regular expressions. Here are some of them:

· a literal is an expression

· an identifier is an expression

· if E is an expression then <prefix-operator> E is also an expression

· if E1 and E2 are expressions then E1 <infix-operator> E2 is also an expression

· if E is an expression then (E) is also an expression

By applying these rules repeatedly, it is possible to build more and more complex expressions.

The same rules can also be used to verify if a given language construct is a syntactic valid expression by decomposing an expression into sub-expressions according to the rules.

Later on, we will add some other rules for building expressions.

2.2.2 Operators

When several operators appear together in an expression, certain rules of precedence are needed to specify the order of evaluation. For example, if we write

a + b * c

the following two interpretations are possible:

(a + b) * c

or

a + (b * c)

depending on the precedence level associated with the + and * operators. As is customary, the * operator in Elisa has a higher precedence level than the + operator, so, the second interpretation holds.

In general, if no specific ordering has been imposed by parentheses, the order of evaluation of an expression is based upon the built-in precedence levels of the prefix and infix operators:

• The four prefix operators ~, +, -, and = ( which will be discussed later), have the highest precedence level and are applied from right to left in case of successive prefix operations. This means that --3 is equivalent to -(-3)) and that " ~ a <> b " is interpreted as " (~ a) <> b ".
• Infix operators have predefined precedence levels. Infix operators with the highest precedence level are applied first. Infix operators with the same precedence level are applied from left to right. The infix operators with their precedence level are defined in the following table (see Figure 2-5).
 operator symbol priority **, .. 6 *, / 5 +, - 4 ==,<>, <, <=, >, >= 3 & 2 | 1

Figure 2-5: Operator Precedence Table

Consequently:

2 + 3 * 4 means 2 + (3 * 4)

2 * 3 * 4 means (2 * 3) * 4

2 ** 3 ** 4 means (2 ** 3) ** 4

2 < 3 & 4 + 5 <= 6 means (2 < 3) & ((4 + 5) <= 6)

With the precedence rules of prefix and infix operators and the additional rule that parentheses may be used to impose a specific order of evaluation we may construct all kinds of regular expressions.

Exercises:

2.1 Remove all superfluous parenthesis pairs:

(2 - ((3 * 4) / 5))

(( 3 - 7 ) * ( 8 + 9))

((4.0 / 5.0) - (6.0 / 7.0))

( ~( ~( -( + 3 )) < 5) & ((19 / 7) == (5 * 3)))

3 ** ( 4 ** 2)

2.2.3 Value Conversions

In an expression, integer and real arithmetic may be intermixed provided proper value conversion operations are used. For example, the expression " 2.5 + 3 " is illegal, because " real + integer " is not defined in the language. Neither the predefined integer type specification nor the real type specification has an entry like " real + integer -> real". If values of different types have to be mixed in an expression, value conversion operations should be used as in "2.5 + real(3)". (In the following chapter we will discuss how mixed type arithmetic can be defined by the user). A value conversion operation converts a value from one type to another type. In the preceding sections we already mentioned the following predefined value conversion operations:

`real(integer)      -> real;`
`integer(real) 	   -> integer;`
`integer(character) -> integer;`
`character(integer) -> character;`

Exercises:

2.2 Correct the illegal expressions:

2 * 3 - 4.0

6 + real(integer(2) /3 )

2 + 4.5 < 7.6 & 4 < 7

4.5 ** 4 ** real (3.5 + 2)

character (integer ('P'))

integer (character ('M')) Part 1: Language Description Chapter 2: Basic Data Types and Expressions