- Up - | Next >> |
So far we've seen chart parsers whose chart contains constituents already found by the parser. One can also use the chart to store "parsing-hypotheses" that is hypotheses about what to look for next. These parsing hypotheses are represented using so-called "dotted rules" that is, rule of the following form:
X --> Ys . Zs
where X is a category and Ys, Zs are (possibly empty) sequences of categories. The intuitive interpretation of this rule is:
I've found the sequence Ys and if the sequence Zs can also be found
then an X will have been found..
Whereas in a passive chart parser, the chart records constituents that have been parsed, in an active chart parser, the chart records parsing hypotheses. Pratically, this means that edges in the chart instead of being labelled with categories, are labelled with dotted rules. There is a special kind of dotted rules however: dotted rules whose dot is right at the end of the rhs e.g.
S --> NP VP.
In effect such a rule indicate a "complete" constituent. Nothing more is looked for, all the constituents on the rhs of the rule have been found and therefore an S has been parsed. Edges which are labelled with such dotted rules are called "passive edges"; other edges are called "active edges" (they still "look for something").
Given these preliminaries, we can now describe the various components of an active chart parser as follows:
An initialisation of the chart
A "fundamental rule" that combines an active edge with a passive edge.
A control strategy (either top-down or bottom-up).
A search strategy (either breadth-first or depth-first)
As this description indicates, active chart parsers can be of different types e.g. bottom-up and depth-first, top-down and breadth first etc. Here we concentrate on bottom-up, depth-first chart parsers. For such parser, the initialisation consists in adding to the chart the passive edges licensed by the words in the input string. For instance, given the sentence "the cat runs", the chart will be initialised to:
edge(1,2,Det --> the.)
edge(2,3,N --> cat.)
edge(3,4,V --> runs.)
The fundamental rule combines an active edge with a passive edge as follows:
Active edge: edge(i,j,X --> Ys.C Zs)
Passive edge: edge(j,k, C --> Ws.)
-------------------------
Resulting edge: edge(i,k,X --> Ys C.Zs)
The control strategy specifies how rules are invoked (in this case bottom-up) and more specifically how to create active edges:
Passive edge: edge(j,k, C --> Ys.)
Rule: X --> C Xs
-------------------------
Resulting active edge: edge(j,k,X --> C. Xs)
If the chart contains a constituent of category C from position j to position k and if the grammar contains a rule with left-corner is C and general format X --> C Xs, then an active edge is produced which stretches from position j to position k and looks for Xs to form an X.
Finally the search strategy is defined by type of agenda we use: if we use a stack, the parser proceeds depth-first through the search space; if we use a queue, it proceeds breadth-first.
Given these basic components an active chart parsing algorithm can be defined as follows:
The top loop is as for any chart parser with agenda: initialise the agenda, process the agenda until it is empty and look the chart up for edges stretching from beginning to end and labelled with the S category.
To process the agenda:
Retrieve an edge from the agenda,
Add this edge to the chart, and active edges it implies to the agenda.
Apply the fundamental rule to this edge as many times as is permitted by the edges in the chart; add the resulting edges to the agenda.
Create as many active edges on the basis of this edge as are permitted by the selection strategy ; add the resulting edges to the agenda.
Keep going till the agenda is empty. then stop.
- Up - | Next >> |