Skip to content

GeneratorImplementationNotes

WoodleyPackard edited this page Jul 12, 2011 · 8 revisions

This page contains a core dump of WoodleyPackard's brain after working on a DELPH-IN compatible generator.

Generation is the process of constructing a surface realization of the meaning contained in an input MRS structure.

The basic algorithm for generation is quite similar to chart parsing. A number of semantically contentful lexemes and rules are "activated" (inserted into the chart) by virtue of a match between the MRS predicate[s] supplied in that particular sign and one or more MRS predicates in the input semantics. Additionally, a number of semantically vacuous signs are inserted into the chart according to the "trigger rules". After chart initialization, lexical and syntactic rules are allowed to run to exhaustion. Finally, all generated edges are checked for compatibility with the input semantics, and the compatible ones are output as the result.

Some notes for would-be generator developers. These notes are written from the viewpoint of generating with the ERG.

  • There is in effect only one cell in the chart. The adjacency criterion for combinability of two edges is that the sets of input EPs that they dominate are disjoint.
  • To rule out the construction of edges in which the "wrong" MRS variable combines with a predicate, we use a technique dubbed "skolemization". This means setting the INSTLOC property of the TFS representation of each MRS variable to a unique string, such as the variable's name. The effect of this is to block many combinations of input edges that should not combine (e.g. paraphrasing "The black cat ate a small mouse." to "The black mouse ate a small cat.") These combinations would be rejected in the post generation MRS compatibility test anyway, but generating them takes up a lot of time.
  • Along the same line, applying a "cheap scope" to the input MRS also blocks many unwanted edges (e.g. paraphrasing "I think the cat thinks the dog is asleep." to "The cat thinks I think the dog is asleep.") An easy way to do this is to use the same skolem constant (INSTLOC property) on the handles on the high and low side of the QEQs.
  • "Index accessibility filtering" is a technique to block the generation of edges that "seal off" MRS variables that still need to be combined with more EPs. For example, we would like to avoid spending time generating "I see a cat chasing a mouse." when the correct result is "I see a cat chasing a frightened mouse.", because there is no way to form an edge that spans all of the input EPs from "I see a cat chasing a mouse." (in particular, the EP corresponding to "frightened" will be unrealized). We would like to avoid generating anything containing with "a mouse". In practice, we can avoid going beyond "chasing a mouse" with this technique, which is a large step in the right direction.
  • At least for the ERG, a significant portion of the lexicon and rule inventory are considered informal, i.e. suitable for parsing but not for generation. Unfortunately, this distinction is not made explicit in the grammar proper. The canonical list of signs unsuitable for generation resides in the file "lkb/globals.lsp", in the parameters *duplicate-lex-ids* and *gen-ignore-rules*
  • Generating from non-native predications (e.g. the year 1973, or my name) is a bit tricky. The ERG pulls this trick off with the parameter *generic-lexical-entries*, which is a list of generic lexemes to try when an EP in the input MRS does not match anything in the semantic index (i.e. regular lexemes and rules).
  • The LKB generator uses a head-driven rule instantiation ordering, which is different from the key-driven ordering used for parsing. This may make some difference in efficiency.
Clone this wiki locally