Skip to content

DelphinTutorial_Grammars

EmilyBender edited this page Jul 24, 2013 · 12 revisions

DELPH-IN Grammars

The GrammarCatalogue page gives an overview of the existing DELPH-IN grammars.

Overview

A DELPH-IN grammar consists of a set of grammar entities which together license a set of strings together with linguistic structures for each string representing morphological, syntactic and semantic information.

The grammar entities bear declarative constraints which encode well-formedness restrictions as well as compositional semantics. There are three major categories of grammar entities:

  • lexical entries giving form/meaning information about particular lemmas

  • lexical rules which build words from lemmas and can add syntactic, semantic, and orthogaphic information

  • phrase structure rules which build larger constituents from words

In addition, there are also two further categories of grammar entities:

  • root symbols which any spanning edge must unify with to be output as a parse

  • node labels used in the display of abbreviated trees

The constraints on the grammar entities are largely inherited from the type hierarchy. The type hierarchies in DELPH-IN grammars are singly-rooted, allow (and make extensive use of ) multiple inheritance and conform to the closed world assumption (no new types are created a runtime). Most of the work in developing a DELPH-IN grammar involves creating and maintaining the type hierarchy.

Though the type hierarchy is formally one large system, notionally it can be seen as involving three main classes of types:

  • lexical types inherited by lexical entries

  • construction types inherited by phrase structure rules and lexical rules

  • ancillary types used in the definition of both of the above

Linguistic Characterization

DELPH-IN grammars are couched within the framework of HPSG. This framework is characterized by what Sag, Wasow and Bender (2003) call constraint-based lexicalism. It is a mono-stratal theory, in which input strings are assigned structures (or rejected) in a single layer of processing (no mapping trees to trees); all constraints are declarative; and lexical entries contain rich information (largely inherited from lexical types). In contrast to classical HPSG's sparse set of grammatical schemata (and consistent with much recent theoretical work), DELPH-IN grammars posit a rich collection of constructions (phrase structure rules).

Structures

The structures licensed by DELPH-IN grammars are attribute-value matrices, which tend to be very large. The LKB (see LKB) provides support for interactively exploring these structures. In general, the most commonly used display formats are CFG-like trees, with node labels abbreviating the feature structures at the nodes and the various formats for displaying the MRS associated with a node (usually the root).

Treebanks

Mature DELPH-IN grammars have associated treebanks, i.e., collections of text for which one analysis per sentence has been selected by hand in the Redwoods style.

The result of processing an input string in the analysis direction with a DELPH-IN grammar and one of the associated processors, if successful, is a parse forest giving all or a subset of the analyses licensed by the grammar. The treebanking tools (ItsdbTreebanking and the new full-forest treebanker) calculate discriminants (syntactic or semantic) each of which divide the parse forest in two. Annotators select trees by choosing among these discriminants. Treebanks are important for training parse selection models as well as for documenting grammar coverage in regression testing.

Creating new grammars

Clone this wiki locally