|
|
|
| An input string is a sentence if the string can be divided into two parts and the first part is a noun-phrase and the second part is a verb-phrase. |
The main problem is how to make the right division because there are many ways to cut a sentence into two pieces. The obvious one for our example sentence can be expressed in a diagram:
the boy throws a stone
<--------------------> <---------------------->
<noun_phrase> <verb_phrase>
How is this division obtained? The general procedure is to try to recognize a <noun_phrase> at the beginning of a sentence and then try to find a <verb_phrase> in what is left. If a first attempt is not successful, then backtracking is needed for another trial. In general, a number of different attempts may be required before the right division is found. At the end of this, we should have used up exactly the words of the sentence, no more and no less. And because noun phrases and verb phrases are also composed of smaller sub-phrases we have a recurring task of decomposing phrases into proper sub-phrases.
As we have done earlier, we assume that each grammar rule is represented by a corresponding definition. Thus, there is a definition for sentence, for noun_phrase, for verb_phrase, and so on. We further assume that all those definitions are similar in nature and are based on the same principles. That means that if we have found a scheme for a sentence definition, we can also use that for other grammar rules. Each definition has an input parameter, which gives the beginning of a phrase and an output parameter which returns the end of a phrase. Furthermore a definition returns the corresponding parse tree.
So, let us start with a version of a definition for parsing sentences:
sentence (Phrase, var (RestPhrase) ) -> optional (ParseTree);
sentence (Start, End) = [ Middle := Start;
NP = noun_phrase (Start, var (Middle) );
VP = verb_phrase (Middle, var (End) );
Sentence (NP, VP)];
The first line is a specification which says that a sentence has an input parameter which is a phrase, and a var parameter which will be used as an output parameter to return the position of the rest of the phrase which does not belong to the sentence. The result specification contains the optional quantifier. It specifies that if the input phrase is a sentence the corresponding parse tree will be returned, otherwise backtracking will occur.
The body of the definition rule consists of four elements:
It first defines a local variable Middle, which will be used later on; its initial value will be the start position of the input phrase.
Then the definition of the noun_phrase will be called. The noun_phrase starts at the same position as the sentence and stops somewhere in the middle of the sentence. If the sentence does not start with a noun_phrase then backtracking occurs and no ParseTree will be returned. Otherwise, the end of the noun phrase is assigned to the local variable Middle and the result of the noun_phrase is named NP.
After that the definition of the verb_phrase will be called. The verb_phrase starts where the noun_phrase was ended at the Middle position. In case of a successful match it will return the end of the verb phrase, and thus the end of the sentence, via the End parameter to the caller. If there is no verb phrase discovered, backtracking occurs to the preceding NP definition and a new alternative for a noun phrase will be tried, if defined. If the definition of the noun phrase has also an optional result-specification no alternative for the noun phrase will be available. In that case, backtracking will occur and no ParseTree will be returned.
If the noun phrase and the verb phrase are both matching then a corresponding node of the ParseTree will be returned. This node is a term with the following specification:
Sentence (ParseTree, ParseTree) => ParseTree;
The corresponding type definitions are:
type Phrase = sequence; type RestPhrase = Phrase; type ParseTree = term;
These type definitions assume that a phrase is a sequence of tokens as produced by the lexical analyzer.
And this makes the first version of the sentence definition complete. Because this version may be simplified, another version can be defined. This second version is derived from the first version by doing some substitutions for NP and VP, without loss of functionality:
sentence (Phrase, var (RestPhrase) ) -> optional (ParseTree);
sentence (Start, End) = [ Middle := Start;
Sentence (noun_phrase (Start, var (Middle) ),
verb_phrase (Middle, var (End) )) ];
Let us now start with the second grammar rule:
<noun_phrase> ::= <determiner> <noun> | <noun>
This rule can be read as follows:
| A phrase is a noun_phrase if, either the phrase can be divided into two parts and the first part is a determiner and the second part is a noun, or, the whole phrase is a noun. |
The grammar rule says that there are two alternatives for a noun_phrase: the first alternative is a determiner followed by a noun, the second alternative is a single noun. We discussed already in the beginning of this chapter the question how to represent syntactic alternatives in a definition. As there, we will include the alternatives in a sequence-expression, as shown in the following example:
noun_phrase (Phrase, var (RestPhrase)) -> optional (ParseTree);
noun_phrase (Start, End) =
[ Middle:= Start;
( NounPhrase (determiner (Start, var (Middle)),
noun (Middle, var (End))),
NounPhrase (null (ParseTree),
noun (Start, var (End)))) ];
The definition body starts with the first alternative in a serial-expression. If the input string is a determiner followed by a noun, the corresponding node of the ParseTree will be returned. This node is a phrase structure with the following term specification:
NounPhrase (ParseTree, ParseTree) => ParseTree;
If the first alternative does not succeed then the second alternative will be tried. If the input string is a single noun then the corresponding node of the ParseTree will be returned.
If none of the attempts succeed backtracking occurs and no ParseTree is returned.
Our third grammar rule has a similar structure:
<verb_phrase> ::= <verb> <noun_phrase> | <verb>
This rule can be read as follows:
| A phrase is a verb_phrase if, either the phrase can be divided into two parts and the first part is a verb and the second part is a noun_phrase, or, the whole phrase is a verb. |
This grammar rule makes use of the already existing grammar rule for noun_phrases. The corresponding definition is:
verb_phrase (Phrase, var (RestPhrase)) -> optional (ParseTree);
verb_phrase (Start, End) =
[ Middle:= Start;
( VerbPhrase (verb (Start, var (Middle)),
noun_phrase (Middle, var (End))),
VerbPhrase (verb (Start, var (End)),
null (ParseTree))) ];
This definition is similar to the noun_phrase definition. The corresponding node of the ParseTree has the following specification:
VerbPhrase (ParseTree, ParseTree) => ParseTree;
In a grammar rule the order of the alternatives, and therefore in the sequence-expression of the corresponding definition, is of importance. For example, if we change the order of the two alternatives in the verb_phrase in the following way:
<verb_phrase> ::= <verb> | <verb> <noun_phrase>
and we change also the order of the corresponding alternatives in the verb_phrase definition then the <verb> will be satisfied without the following <noun_phrase>. That means that parsing of the sentence:
the boy throws a stone
stops after the verb "throws", leaving the rest of the sentence unparsed. The general rule for ordering alternatives in a grammar rule is to specify the most encompassing cases first.
Associated with incorporating alternatives in definitions is the specification of the result of the definition. In all examples so far, we used the optional quantifier. This means that even when more than one alternative could be applied, only the first matching alternative will be returned, which is what we usually will expect. However, sometimes there are cases where we want to see all the matching alternatives. For example, we may be interested in all possible parsings of ambiguous phrases. In that case the quantifier multi should be used.
As we have seen, words are belonging to special categories , such as determiner, noun, verb, and so on. In real applications the words are often belonging to an application domain and are stored, together with other characteristics in a lexicon or dictionary. For our exercises, that will not be necessary. We will show how words can be defined by definitions as if they were stored in a dictionary.
Our first example is the definition of a limited number of determiners:
determiner (Phrase, var (RestPhrase) ) -> optional (ParseTree);
determiner (=nil, Rest) = no (ParseTree);
determiner ("a" & Tail , Rest) = [ Rest:= Tail; Determiner ("a")];
determiner ("the" & Tail , Rest) = [ Rest:= Tail; Determiner ("the")];
determiner ("all" & Tail , Rest) = [ Rest:= Tail; Determiner ("all")];
determiner ("some" & Tail , Rest) = [ Rest:= Tail; Determiner ("some")];
Each definition rule tests if the head of the list is a specific determiner. If so, then the rest of the list will be given back via the output parameter and the corresponding node of the ParseTree will be returned by using the following specification:
Determiner (text) => ParseTree;
Our next example is the definition of a limited number of nouns:
noun (Phrase, var (RestPhrase) ) -> optional (ParseTree);
noun (=nil, Rest) = no (ParseTree);
noun ("man" & Tail, Rest) = [ Rest:= Tail; Noun ("man") ];
noun ("apple" & Tail, Rest) = [ Rest:= Tail; Noun ("apple") ];
noun ("boy" & Tail, Rest) = [ Rest:= Tail; Noun ("boy") ];
noun ("stone" & Tail, Rest) = [ Rest:= Tail; Noun ("stone") ];
noun ("Jack" & Tail, Rest) = [ Rest:= Tail; Noun ("Jack") ];
This definition is similar to the definition of determiner. The Noun node is specified by:
Noun (text) => ParseTree;
Our final example is the definition of a limited number of verbs:
verb (Phrase, var (RestPhrase) ) -> optional (ParseTree);
verb (=nil, Rest) = no (ParseTree);
verb ("sings" & Tail, Rest) = [ Rest:= Tail; Verb ("sings") ];
verb ("eats" & Tail, Rest) = [ Rest:= Tail; Verb ("eats") ];
verb ("throws" & Tail, Rest) = [ Rest:= Tail; Verb ("throws") ];
This definition is similar to the two preceding definitions. The Verb node is specified by:
Verb (text) => ParseTree;
We have now defined all the elements of a set of very limited grammar rules. The next step is to glue inputs and outputs of lexical analysis and syntax analysis together in one parsing definition:
Parse (text) -> optional (ParseTree);
Parse (Text) = [ End:= null (sequence);
sentence (tokenize (Input (Text)), var (End) ) ];
With these building blocks we are now able to construct a component for syntax analysis (see Figure 6-4):
![]()
component SyntaxAnalysis;
type ParseTree = term;
Parse(text) -> optional (ParseTree);
begin
type Phrase = sequence;
type RestPhrase = Phrase;
nil = null (ParseTree);Sentence (ParseTree, ParseTree) => ParseTree; NounPhrase (ParseTree, ParseTree) => ParseTree; VerbPhrase (ParseTree, ParseTree) => ParseTree; Determiner (text) => ParseTree; Noun(text) => ParseTree; Verb(text) => ParseTree;
Parse (text) -> optional (ParseTree);
Parse (Text) = [ End:= null (sequence);
sentence (tokenize (Input (Text)), var (End) ) ]; sentence (Phrase, var (RestPhrase) ) -> optional (ParseTree);
sentence (Start, End) =
[ Middle := Start;
Sentence (noun_phrase (Start, var (Middle) ),
verb_phrase (Middle, var (End) )) ]; noun_phrase (Phrase, var (RestPhrase)) -> optional (ParseTree);
noun_phrase (Start, End) =
[ Middle:= Start;
( NounPhrase (determiner (Start, var (Middle)),
noun (Middle, var (End))),
NounPhrase (null (ParseTree),
noun (Start, var (End)))) ]; verb_phrase (Phrase, var (RestPhrase)) -> optional (ParseTree);
verb_phrase (Start, End) =
[ Middle:= Start;
( VerbPhrase (verb (Start, var (Middle)),
noun_phrase (Middle, var (End))),
VerbPhrase (verb (Start, var (End)),
null (ParseTree))) ]; determiner (Phrase, var (RestPhrase) ) -> optional (ParseTree);
determiner (=nil, Rest) = no (ParseTree);
determiner ("a" & Tail , Rest) = [ Rest:= Tail; Determiner ("a")];
determiner ("the" & Tail , Rest) = [ Rest:= Tail; Determiner ("the")];
determiner ("all" & Tail , Rest) = [ Rest:= Tail; Determiner ("all")];
determiner ("some" & Tail , Rest) = [ Rest:= Tail; Determiner ("some")]; noun (Phrase, var (RestPhrase) ) -> optional (ParseTree);
noun (=nil, Rest) = no (ParseTree);
noun ("man" & Tail, Rest) = [ Rest:= Tail; Noun ("man") ];
noun ("apple" & Tail, Rest) = [ Rest:= Tail; Noun ("apple") ];
noun ("boy" & Tail, Rest) = [ Rest:= Tail; Noun ("boy") ];
noun ("stone" & Tail, Rest) = [ Rest:= Tail; Noun ("stone") ];
noun ("Jack" & Tail, Rest) = [ Rest:= Tail; Noun ("Jack") ]; verb (Phrase, var (RestPhrase) ) -> optional (ParseTree);
verb (=nil, Rest) = no (ParseTree);
verb ("sings" & Tail, Rest) = [ Rest:= Tail; Verb ("sings") ];
verb ("eats" & Tail, Rest) = [ Rest:= Tail; Verb ("eats") ];
verb ("throws" & Tail, Rest) = [ Rest:= Tail; Verb ("throws") ];end component SyntaxAnalysis;
![]()
Figure 6-4: Example of a component for syntactic analysis
We can now use the components in a session:
use TextInput, LexicalAnalysis, SyntaxAnalysis;
and we request the system to:
Parse ("the boy throws a stone")?
The parsing of the input text results into a tree structure corresponding to the following parse tree:
Sentence ( NounPhrase ( Determiner ("the"), Noun ("boy")),
VerbPhrase ( Verb ("throws"), NounPhrase ( Determiner ("a"),
Noun ("stone"))))
Compare this with the parse tree of Figure 6-3.
If we continue the session with:
Parse ("boy throws")?
the resulting parse tree is:
Sentence ( NounPhrase ( [ ], Noun ("boy") ),
VerbPhrase ( Verb ("throws"), [ ] ) )
Exercise:
| 6.5 Assume that we have defined the syntax of arithmetic expressions by means of the following rules: |
<expr> ::= <term> | <expr> + <term> | <expr> - <term>
<term> ::= <factor> | <term> * <factor> | <term> / <factor>
<factor> ::= <identifier> | <number> | (<expr>)
Develop a component for the syntax analysis of these arithmetic expressions.
Semantic analysis is concerned with the meaning of sentences in a language. Once sentences are parsed by means of syntax analysis the next step is to understand the sentences.
A sentence may be grammatically correct but still lack meaning. For example, the sentence "The flower drove 250 beats per hour" is grammatically correct. However, a flower cannot drive and its speed is not measured in beats per hour. The semantic rules for a language differentiate between meaningful sentences and gibberish. During semantic analysis, these semantic rules are applied.
Defining the semantic rules of natural languages is an extremely difficult problem that is the subject of on-going research. Solutions to the problem of formalizing the syntax and semantics of a language have only been successfully applied to simple subsets of natural languages. Those subsets not only limit the grammar to simple sentences, but they also make assumptions about the kinds of sentences the user will use. Typical examples are found in the area of data base applications where dialogues are limited to the domain covered by the data base.
In this section we will show how semantic analysis may help to understand sentences in a specific domain. The application domain we have chosen is the problem of finding optimal routes between cities as described in the preceding chapter. We will use already existing components and we will develop a natural language interface based on a limited number of sentences. The system will accept two kinds of sentences: questions and assertions.
Users can ask for routes between two cities. Examples of specific questions include:
What is the distance between Amsterdam and Nice?
Distance Berlin and Vienna?
What is the best route from Rome to Paris?
What are the routes from Paris to Vienna?
Users can also enter new information about cities and their distances. Examples of such assertions are:
The distance between Amsterdam and Brussels is 220 kilometers.
Distance Brussels and Paris is 297 km.
The sentences which will be accepted by our system are limited to a number of specific patterns. Examples of such patterns are:
What is the distance between X and Y?
Distance X and Y?
What is the best route from X to Y?
What are the routes from X to Y?
The distance between X and Y is Z kilometers.
Distance X and Y is Z km.
A pattern serves as a model for a variety of sentences. A pattern may contain a number of variables which specify where input sentences may deviate from the pattern. In our examples, X, Y, and Z are pattern variables. During pattern matching where an input sentence is matched against one of the pattern sentences, the pattern variables are representing the variable parts of the input sentence. Note that the patterns we use here are not the same as the patterns used in parameter expressions for pattern matching of terms, although both kinds of pattern serve the same purpose. In this context we use the word "pattern" to describe a number of possible sentences.
To illustrate the basic principles of pattern matching for sentences we will use in our system a very simple procedure. For a successful match, an input sentence has to match word for word and token for token with the pattern sentence. The only exceptions are pattern variables which match any token. In all other cases, such as deviations of letters in upper and lower case, or differences in punctuation symbols, or in the total number of tokens will lead to an unsuccessful match. If there is no pattern sentence found which matches the input sentence, then the input sentence cannot be recognized by the system.
Associated with each pattern is a semantic action of the form:
Pattern 1 => Action 1
Pattern 2 => Action 2
..................................
Pattern n => Action n
If an input sentence matches a pattern then the corresponding semantic action will be executed. The semantic action depends on the recognized pattern.
In our program, a pattern is represented by a text string and the associated semantic action is coded as a text string of two characters. All patterns are collected in a pattern list:
PatternList =
{ pattern ("Q1", "What is the distance between X and Y?"),
pattern ("Q1", "Distance X and Y?"),
pattern ("Q2", "What is the best route from X to Y?"),
pattern ("Q3", "What are the routes from X to Y?"),
pattern ("A1", "The distance between X and Y is Z kilometers."),
pattern ("A1", "Distance X and Y is Z km.")};
Mark that for some patterns the same semantic actions are specified. In the code for semantic actions we used the letter Q for questions and the letter A for assertions, although this distinction is not essential.
Patterns are converted to sequences of tokens as specified by the following definition:
pattern (text, text) -> optional (sequence); pattern (Label, Pattern) = Label & tokenize (Input (Pattern));
The tokenize procedure is defined by the Lexical Analysis component. The label for the semantic action is placed in front of the pattern sequence.
Input sentences are also converted to sequences of tokens. After an input sequence has been converted, the resulting sequence of tokens will be used to search for a matching pattern. The matching procedure is defined by:
Match (Input = sequence, Pattern = sequence) -> optional (sequence);
Match (= nil, = nil) = nil;
Match (= nil, X) = no (sequence);
Match (X, = nil) = no(sequence);
Match (H1 & T1, H1 & T2) = Match (T1, T2);
Match (H1 & T1, H2 & T2) = if Variable (H2) then H1 & Match (T1, T2)
else no (sequence);
Match (X , Y ) = no (sequence);
This procedure matches the input and pattern sequences, token for token. For a successful match all tokens must be equal, with the exception of input tokens which are matching pattern variables, as defined by the one but last definition rule. In that case the input token will be part of an output sequence destined for further processing. In all other cases, the pattern sequence is not matching with the input sequence, no output sequence will be returned, and another pattern will be tried for matching.
In our program, a pattern variable is represented by a single, capital letter, as defined by:
Variable (term) -> boolean; Variable (text: T) = [ C = T[1]; size (T) == 1 and 'A' <= C and C <= 'Z'];
If there is a match between the input sequence and a pattern sequence, the matching pattern has been found, and no further matching is necessary; the output sequence can be used in the corresponding semantic action. Take for example the following input sentence:
What is the best route from Amsterdam to Vienna?
This matches with the third pattern:
What is the best route from X to Y?
The resulting output sequence will be:
Amsterdam & (Vienna & nil)
The corresponding semantic action is coded as "Q2".
The meaning of a sentence is defined by the semantic action code and the output sequence. In our program, all semantic actions are defined by one definition:
Action(Code = term, sequence) -> optional (term);
Action(Q, =nil) = no (term);
Action("Q1", X & (Y & =nil)) = Distance (SearchBestFirst(Map, Node(X), Node(Y)));
Action("Q2", X & (Y & =nil)) = Route (SearchBestFirst(Map, Node(X), Node(Y)));
Action("Q3", X & (Y & =nil)) = { Route (SearchBestFirst(Map,Node(X), Node(Y))) };
Action("A1", X & (Y & (Z & =nil))) = [ add (Map, arc(Node(X), Node(Y), Cost(Z))); term: "accepted"];
This definition defines the meaning of three questions and one assertion. The first question asks for the distance of two cities. To answer that question the SearchBestFirst function is called for the two cities X and Y, as defined in the preceding chapter. The Node converts a token into a city node. The Distance function computes the distance from the output of SearchBestFirst. The answer will be returned to the user.
The second and third kinds of questions are similar in nature. The Route function computes the route between two cities.
The last definition rule defines the semantic action for an assertion. It adds new information about the distance between two cities to the Map and returns the word "accepted" to the user.
The Route function needs some explanation. It makes some cosmetic changes to the output as created by SearchBestFirst. For example, if we ask for the route between Amsterdam and Rome SearchBestFirst returns as one of the answers:
cost(Rome, 2152) & (cost(Nice, 1429) & (cost(Paris, 517) & (cost(Amsterdam, 0) & [ ])))
However, we like to return the inverse list of cities, starting with Amsterdam, and without the cost information. This requires some additional programming. The Route definition is:
Route (sequence) -> term;
Route ( S) = term:{ prioritems ( {State (Items (S) )} )};
The Items function returns a stream of nodes from sequence S and State returns the city information from a node. The prioritems function is used to reverse the order of the nodes.
Based on these definitions we can now construct a component for semantic analysis of our application domain (see Figure 6-5).
![]()
component SemanticAnalysis; UserInput (text) -> term; begin nil = null (sequence); term & sequence => sequence; text & sequence => sequence;
Map = Graph: alist (Arc);
PatternList =
{ pattern ("Q1", "What is the distance between X and Y?"),
pattern ("Q1", "Distance X and Y?"),
pattern ("Q2", "What is the best route from X to Y?"),
pattern ("Q3", "What are the routes from X to Y?"),
pattern ("A1", "The distance between X and Y is Z kilometers."),
pattern ("A1", "Distance X and Y is Z km.")};pattern (text, text) -> optional (sequence); pattern (Label, Pattern) = Label & tokenize (Input (Pattern));
UserInput (text) -> term;
UserInput (input) = [ Result:= nil;
[ Result:= Process (input) ];
if Result <> nil then Result
else term: "Sentence not understood!"]; Process (text) -> optional (term);
Process (input) = [ Input = tokenize (Input (input));
Pattern = items (PatternList);
Action (Head (Pattern), Match (Input, Tail (Pattern))) ]; Match (Input = sequence, Pattern = sequence) -> optional (sequence);
Match (= nil, = nil) = nil;
Match (= nil, X) = no (sequence);
Match (X, = nil) = no(sequence);
Match (H1 & T1, H1 & T2) = Match (T1, T2);
Match (H1 & T1, H2 & T2) = if Variable (H2) then H1 & Match (T1, T2)
else no (sequence);
Match (X , Y ) = no (sequence);Variable (term) -> boolean; Variable (text: T) = [ C = T[1]; size (T) == 1 and 'A' <= C and C <= 'Z'];
Action(Code = term, sequence) -> optional (term);
Action(Q, =nil) = no (term);
Action("Q1", X & (Y & =nil)) = Distance (SearchBestFirst(Map, Node(X), Node(Y)));
Action("Q2", X & (Y & =nil)) = Route (SearchBestFirst(Map, Node(X), Node(Y)));
Action("Q3", X & (Y & =nil)) = { Route (SearchBestFirst(Map,Node(X), Node(Y))) };
Action("A1", X & (Y & (Z & =nil))) = [ add (Map, arc(Node(X), Node(Y), Cost(Z))); term: "accepted"]; Distance (sequence) -> term;
Distance ( cost ( State, integer: Cost) & Z) =
term: "The distance is " & text (Cost, -1) & " kilometers.";Node (term) -> Node; Node (text: T) = Node: symbol (T);
Route (sequence) -> term;
Route (S) = term: { prioritems( {State (Items (S))} )};Cost (term) -> Cost; Cost (text: T) = Cost: integer (T);
Items (sequence) -> multi (term); Items ( =nil) = no (term); Items ( Head & Tail) = (Head, Items (Tail));
State (term) -> State; State( cost (State, Cost)) = State;
Head (sequence) -> optional (term); Head (Head & Tail) = Head;
Tail (sequence) -> optional (sequence); Tail (Head & Tail) = Tail;
end component SemanticAnalysis;
![]()
Figure 6-5: Example of a component for semantic analysis
We may now use this component for any collection of cities we are interested in. Let us start with a session:
use TextInput; use LexicalAnalysis; use SearchBestFirst; use SemanticAnalysis;
Because the SemanticAnalysis component starts with an empty map, we first have to fill the map with information. Let us use the information about the distances between cities as shown in Figure 20-3. We enter a number of assertions:
UserInput ("Distance Amsterdam and Berlin is 669 km.")?
UserInput ("Distance Berlin and Vienna is 648 km.")?
UserInput ("Distance Vienna and Rome is 1150 km.")?
UserInput ("Distance Amsterdam and Paris is 517 km.")?
UserInput ("Distance Paris and Vienna is 1271 km.")?
UserInput ("Distance Paris and Nice is 912 km.")?
UserInput ("Distance Nice and Vienna is 1130 km.")?
UserInput ("Distance Nice and Rome is 723 km.")?
We may now enter a dialogue with the system and ask questions, such as:
UserInput ("What is the distance between Amsterdam and Rome?")?
"The distance is 2152 kilometers."UserInput ("Distance Amsterdam and Nice?")?
"The distance is 1429 kilometers."UserInput ("What is the best route from Amsterdam to Nice?")?
{ Amsterdam,
Paris,
Nice}
These examples illustrate how pattern matching can be used to recognize specific input sentences and how the corresponding semantic actions may create the necessary response. It is easy to envision for a given application domain all kinds of pattern sentences, in full and in abbreviated forms, in native and in foreign languages, which are useful for a number of users.
If we compare pattern matching with syntax analysis as described in previous sections, we may say that a pattern also represents a syntax rule. As with a syntax rule it describes the fixed parts and variable parts of a sentence. Compare this with a syntax rule for an if statement as used in programming languages:
if B then S1 else S2
Such a syntax rule is also a pattern with fixed and variable parts. In this pattern, B, S1, and S2 may be considered as pattern variables, although we prefer to call them syntactic units because they may represent compound structures. On the other hand, we may also may make patterns as
The distance between X and Y is Z kilometers.
more precise by using a different notation:
The distance between <City1> and <City2> is <Number> kilometers.
The syntax of <City1>, <City2>, and <Number> can now be defined in normal syntax rules. In this way pattern variables can represent compound syntactic structures. This means that pattern matching can be a quite sophisticated process because it may involve syntax analysis as described earlier.
Exercises:
6.6 One of the drawbacks of the current pattern matching procedure is its inflexibility in spelling. Design a pattern matching procedure which is independent of upper and lower case letters.
6.7 Another disadvantage of the current pattern matching procedure is that words in the input sentence may match with pattern variables, which are meant to be numbers. Design a pattern matching procedure which will accept only numbers where numbers are required.
| Language processing has a wide range of applications. Artificial languages as well as natural languages can all be treated according to the same principles. Analysis of languages can be described as three steps: lexical analysis, syntax analysis, and semantic analysis. |
| A grammar can be used in two ways. It can be used to generate all possible legal sentences and it can also be used to recognize a given sentence. |
| Grammar notations can be used to translate syntax definitions easily into corresponding definitions. |
| The main task of a lexical analyzer is to read the characters from an input stream and produce as output a sequence of tokens that can be used as input for following steps of language analysis. A token consists of one or more characters and is the smallest meaningful linguistic unit of the language, such as a word, a number, or any other simple or compound symbol of the language. |
| Components for standard text input, lexical analysis, and syntax analysis have been defined. |
| Syntax analysis involves the parsing and analysis of the syntax of the input text according to the grammar rules of the language. |
| Semantic analysis is concerned with the meaning of sentences in a language. Once sentences are parsed by means of syntax analysis the next step is to understand the sentences. |
| Pattern matching is a useful technique for handling a limited number of sentences in a specific application domain. |
| A pattern serves as a model for a variety of sentences. A pattern may contain a number of variables which specify where input sentences may deviate from the pattern. |
Different strategies for language processing are discussed by several authors. Winston(1984) gives a general introduction to natural language understanding. Winston and Horn(1984) describe language processing based on Augmented Transition Trees (ATTs). Charniak and McDermott(1985) give an introduction to natural language parsing and are discussing natural language processing based on Augmented Transition Nets (ATNs).
Applications of Definite Clause Grammars (DFGs) as used in Prolog, are discussed by Clocksin and Mellish(1984), Marcus(1986), Walker(1987), and Bratko(1990).
| Part 2: Data Structures | Chapter 6: Language Processing |
![]()