20.1.2 Word2Node

20.1.2 `Word2Node`

procedure Word2Node creates a new lexical Node. It takes as arguments I, the position of the word in the input sentence, W, the atom representing the word, N, the number of words in the sentence, and Topo, a record containing information on topological fields.

<DG: Word2Node>= >>: proc {Word2Node I W N Topo Node} L = {Dictionary.get Lex.lexicon W} in Node = node(index : I word : W entries : L field : Topo <DG Word2Node: features> ) <DG Word2Node: constraints> end

We equip the lexical node with feature entryIndex to indicates which entry in list L is selected.

<DG Word2Node: features>= >>: entryIndex : {FD.int 1#{Length L}}

Further we also give it features cat (category) and agr (agreement) which must be licensed by the selected entry:

<DG Word2Node: features>= << >>: cat : {MakeCat} agr : {MakeAgr}

<DG: Word2Node>= << >>: CatRange = 0#Lex.cat.card-1 fun {MakeCat} {FD.int CatRange} end AgrRange = 0#Lex.agr.card-1 fun {MakeAgr} {FD.int AgrRange} end

<DG Word2Node: constraints>= >>: {FS.include Node.cat {Select {Map L fun {$ E} E.cats end} Node.entryIndex}} {FS.include Node.agr {Select {Map L fun {$ E} E.agrs end} Node.entryIndex}}

Feature marks is a set that contains zu if the `zu' particle is morphologically part of the word (e. g. einzukaufen), vpref if the separable prefix is not separated, haben if auxiliary `haben' is desired, sein if auxiliary `sein' is desired. Feature aux is also a set of marks and is only used on auxiliary verbs to indicate which auxiliary it is (i. e. either haben or sein).

<DG Word2Node: features>= << >>: marks : {MakeMarks} aux : {MakeMarks}

<DG: Word2Node>= << >>: MarkRange = 0#Lex.mark.card-1 fun {MakeMarks} {FS.var.upperBound MarkRange} end

They must also correspond to the selected entry:

<DG Word2Node: constraints>= << >>: Node.marks={Select {Map L fun {$ E} E.marks end} Node.entryIndex} Node.aux ={Select {Map L fun {$ E} E.aux end} Node.entryIndex}

Feature vprefs is a set that is either empty or contains the expected separated verb prefix (e. g. `ein' as in ``ich kaufe etwas ein'').

<DG Word2Node: features>= << >>: vpref : {MakeVpref}

<DG: Word2Node>= << >>: VprefRange = 0#Lex.vpref.card-1 fun {MakeVpref} {FS.var.upperBound VprefRange} end

It must correspond to the selected entry:

<DG Word2Node: constraints>= << >>: Node.vpref={Select {Map L fun {$ E} E.vpref end} Node.entryIndex}

A lexical entry specifies required complements in feature comps_req and optional complements in feature comps_opt. The set of complement roles of the lexical node is represented by feature comps and is bounded by the required complements at the lower end and the union of the required complements and the optional complements at the upper end.

<DG Word2Node: features>= << >>: comps : {MakeComps}

<DG: Word2Node>= << >>: CompRange = 0#{Length Lex.comps}-1 fun {MakeComps} {FS.var.upperBound CompRange} end

<DG Word2Node: constraints>= << >>: local Lo = {Select {Map L fun {$ E} E.comps_req end} Node.entryIndex} Hi = {Select {Map L fun {$ E} {FS.union E.comps_req E.comps_opt} end} Node.entryIndex} in {FS.subset Lo Node.comps} {FS.subset Node.comps Hi} end

On feature role is a record that map each possible role $\rho$ to a set of (indices) of lexical nodes denoting the immediate daughters of type $\rho$ . For example, Node.role.adj denotes the set of adjectives of this node.

<DG Word2Node: features>= << >>: role : {MakeRoleRecord N}

<DG: Word2Node>= <<: Roles = Lex.role.values Comps = Lex.comps fun {MakeRoleRecord N} {List.toRecord o {Map Roles fun {$ R} R#{FS.var.upperBound 1#N} end}} end

A complement role has cardinality at most 1. It is 1, precisely when it is licensed by the node's valency Node.comps.

<DG Word2Node: constraints>= << >>: {ForAll Comps proc {$ C} {FS.card Node.role.C} = {FS.reified.include Lex.role.val2int.C Node.comps} end}

The set of daughters of the node is formed by the union of all its role sets.

<DG Word2Node: features>= << >>: daughters : _

<DG Word2Node: constraints>= << >>: {FS.unionN Node.role Node.daughters}

The only node without a mother is the root

<DG Word2Node: features>= << >>: mother : {FS.var.upperBound 1#N}

<DG Word2Node: constraints>= << >>: (Node.field.root\=:Node.index)={FS.card Node.mother}

The node has both a strict yield yieldS and full yield. The latter is obtained as the disjoint union of the strict yield and the singleton containing the node, thus enforcing that the node may not appear in its own strict yield (rules out loops).

<DG Word2Node: features>= << >>: yieldS : {FS.var.upperBound 1#N} yield : {FS.var.upperBound 1#N}

<DG Word2Node: constraints>= << >>: Node.yield = {FS.partition [{FS.value.make [Node.index]} Node.yieldS]}

We also introduce fieldIndex to indicate in which field the word occurs. (not used so far!)

<DG Word2Node: features>= << >>: fieldIndex : {FD.int 1#4}

<DG Word2Node: constraints>= << >>: {FS.include Node.index {Select Topo.list Node.fieldIndex}}

Now we enforce the condition that the Vorfeld contains a unique constituent. More precisely, we distinguish a root Topo.vfr of the Vorfeld: (1) either the word is not in the vorfeld, (2) or it is the root of the vorfeld, (3) or it is in the vorfeld, but is not the root, and its mother is also in the vorfeld.

<DG Word2Node>=: thread or {FS.exclude Node.index Topo.vf} [] Node.index = Topo.vfr [] Node.index\=:Topo.vfr {FS.include Node.index Topo.vf} {FS.subset Node.mother Topo.vf} end end

If the word occurs in the nachfeld, then its mother cannot be root (is that right?)

<DG Word2Node: constraints>= << >>: local B1 B2 in [B1 B2] ::: 0#1 {FS.reified.include Node.index Topo.nf B1} % in NF {FS.reified.include Topo.root Node.mother B2} % mother is root B2=<:(B1=:0) end

A word cannot at the same time take a `zu' particle complement and also have a `zu' particle in its morphological form. We equip the node with a haszu feature to indicate whether it has either.

<DG Word2Node: features>= << >>: haszu : {FD.int 0#1}

<DG Word2Node: constraints>= <<: Node.haszu =: {FS.reified.include MARK_ZU Node.marks} +{FS.card Node.role.zu}

<DG: Encoded Values>= >>: MARK_ZU = {Lex.mark.encode zu}