Skip to content

DelphinTutorial_Grammars

EmilyBender edited this page Jul 30, 2013 · 12 revisions

DELPH-IN Grammars

The GrammarCatalogue page gives an overview of the existing DELPH-IN grammars, including the ERG and many other grammars ranging in size from broad-coverage to experimental.

Overview

A DELPH-IN grammar consists of a set of grammar entities which together license a set of strings together with linguistic structures for each string representing morphological, syntactic and semantic information.

The grammar entities bear declarative constraints which encode well-formedness restrictions as well as compositional semantics. There are three major categories of grammar entities:

  • lexical entries giving form/meaning information about particular lemmas

  • lexical rules which build words from lemmas and can add syntactic, semantic, and orthogaphic information

  • phrase structure rules which build larger constituents from words

In addition, there are also two further categories of grammar entities:

  • root symbols which any spanning edge must unify with to be output as a parse

  • node labels used in the display of abbreviated trees

The constraints on the grammar entities are largely inherited from the type hierarchy. The type hierarchies in DELPH-IN grammars are singly-rooted, allow (and make extensive use of ) multiple inheritance and conform to the closed world assumption (no new types are created a runtime). Most of the work in developing a DELPH-IN grammar involves creating and maintaining the type hierarchy.

Though the type hierarchy is formally one large system, notionally it can be seen as involving three main classes of types:

  • lexical types inherited by lexical entries

  • construction types inherited by phrase structure rules and lexical rules

  • ancillary types used in the definition of both of the above

Linguistic Characterization

DELPH-IN grammars are couched within the framework of HPSG. This framework is characterized by what Sag, Wasow and Bender (2003) call constraint-based lexicalism. It is a mono-stratal theory, in which input strings are assigned structures (or rejected) in a single layer of processing (no mapping trees to trees); all constraints are declarative; and lexical entries contain rich information (largely inherited from lexical types). In contrast to classical HPSG's sparse set of grammatical schemata (and consistent with much recent theoretical work), DELPH-IN grammars posit a rich collection of constructions (phrase structure rules).

Structures

The structures licensed by DELPH-IN grammars are attribute-value matrices, which tend to be very large. The LKB (see LKB) provides support for interactively exploring these structures. In general, the most commonly used display formats are CFG-like trees, with node labels abbreviating the feature structures at the nodes and the various formats for displaying the MRS associated with a node (usually the root).

Treebanks

Mature DELPH-IN grammars have associated treebanks, i.e., collections of text for which one analysis per sentence has been selected by hand in the Redwoods style.

The result of processing an input string in the analysis direction with a DELPH-IN grammar and one of the associated processors, if successful, is a parse forest giving all or a subset of the analyses licensed by the grammar. The treebanking tools (ItsdbTreebanking and the new full-forest treebanker) calculate discriminants (syntactic or semantic) each of which divide the parse forest in two. Annotators select trees by choosing among these discriminants. Treebanks are important for training parse selection models as well as for documenting grammar coverage in regression testing.

Documentation

  • Lexical type documentation (TODO)

  • SEM-I: A grammar that is to be used in generation should have a SEM-I (semantic interface),

declaring the predicates accepted by the grammar, their arity, and the types of values expected in each argument position.

Creating new grammars

The Grammar Matrix provides support for creating additional DELPH-IN style grammars. The Matrix consists of a core grammar shared by all Matrix-derived grammars and a set of libraries of analyses of cross-linguistically variable phenomena. These libraries are accessed through the customization system which elicits a high-level linguistic description from the user and outputs a working DELPH-IN style fragment. The analyses provided by the customization system and best practices for accessing them are documented in the pages under MatrixDocTop.

Grammar writers beginning new grammars and interested in systematic exploration of the analysis space are encouraged to explore the CLIMB methodology (see ClimbTop).

Clone this wiki locally