Skip to content

LogonModeling

ErikVelldal edited this page Jun 13, 2008 · 25 revisions

Overview

This page contains various code examples showing how to estimate and apply statistical models within LOGON. For more detailed information on feature types, estimation parameters, or the experimentation environment in general, see Velldal '08 ([http://www.velldal.net/erik/pubs/Velldal08.pdf pdf]).

Discriminative Modeling

We here assume that the system and the grammar are already loaded.

Set the feature parameters. The defaults correspond to:

(let ((*feature-grandparenting* 4)
      (*feature-active-edges-p* t)
      (*feature-ngram-size* 4)
      (*feature-ngram-back-off-p* t)
      (*feature-ngram-tag* :type)
      (*feature-use-preterminal-types-p* t)
      (*feature-lexicalization-p* t)
      (*feature-constituent-weight* 2)
      (*feature-lm-p* 10)
      (*feature-frequency-threshold* nil))

BR

Create a feature cache for the (virtual) profile jhpstg.g (we typically use the .g to denote a generation treebank):

(operate-on-profiles (list "jhpstg.g") :task :fc))

Intended as a one-time operation, the feature caching extracts all the features from the treebank and stores them in a (BDB) data base within the respective profile directories (named fc.bdb). When running experiments later, this means that we simply look up the features in the DB, saving us the cost of extraction. A symbol-table named fc.mlm (created within the jhpstg.g profile for the example above) records the mapping from symbolic feature representation to numerical indexes (as used for model estimation and DB storage). The symbol-table is only referenced when exporting or applying a model to new data (see example below).

BR

Example of how to run a single experiment using 5-fold cross-validation:

(setq test "jhpstg-test-profile")
(tsdb :create test :skeleton "jhpstg")
(rank-profile gold
              test
              :nfold 5)

BR

Running a batch of 10-fold maxent experiments on "jhpstg.g", iterating over different configurations of features and estimation parameters (batch-experiment performs an exhaustive "grid-search" over all lists of specified parameter values):

(batch-experiment :type :mem
                  :variance '(nil 1000 100 10 1 1.0e-1 1.0e-2)
                  :absolute-tolerance 1.0e-10
                  :source "jhpstg.g"
                  :skeleton "jhpstg"
                  :random-sample-size nil
                  :ngram-size '(0 1 2 3)
                  :active-edges-p nil
                  :grandparenting '(0 1 2 3)
                  :lm-p 10
                  :counts-relevant 1
                  :nfold 10
                  :compact t)

The following gives a brief explanation of the various keyword arguments: The :variance parameter governs the Gaussian prior on feature weights. :absolute-tolerance governs the convergence threshold. Specifying a non-nil (integer) value n for :random-sample-size means that only a random selection of (maximally) n non-preferred candidates for each item is included in the training data. :counts-relevant governs a frequency-based cutoff on feature values. The keywords :ngram-size, :active-edges-p, and :grandparenting allows iteration over feature parameters. Note that specifying :lm-p 10 means that the value of the language model feature is divided by 10. This is basically a hack to avoid numerical problems during estimation. To leave out the LM feature, call with :lm-p nil instead. Specifying :type :mem means that we're training a conditional maximum entropy model (log-linear model). The value of :type could also be :svm if you have SVMlight installed. The boolean-valued :compact governs the naming convention when creating target profiles.

BR Example of how to estimate and export a maxent model:

(let ((*feature-grandparenting* 3)
      (*feature-ngram-size* 3)
      (*feature-lm-p* nil)
      (*maxent-variance* 8e-4)
      (*feature-frequency-threshold* (make-counts :relevant 1)))
  (train "jhpstg.g" "jhpstg.g.mem" :fcp nil :type :mem))

BR

Applying the model trained above to the generation treebank rondane.g:

(tsdb :create "rondane-test-profile"
       :skeleton "rondane")

(operate-on-profiles (list "rondane.g")
                      :model (read-model model)
                      :target "rondane-test-profile"
                      :task :rank)
Clone this wiki locally