Skip to content
Lenz Furrer edited this page May 24, 2021 · 9 revisions

Brat and BioNLP

Brat is browser-based tool for creating and viewing annotations over plain-text documents. It uses a stand-off representation for the annotations, which is a mixture of tab- and space-separated values, and which extends a format introduced at a BioNLP shared task.

bconv supports two variants: brat (tailored to the restrictions of the annotation tool) and the simpler bionlp version. Both variants support an att parameter which refers to the annotation attribute of the "type" slot in a T line. It defines a key in Entity.metadata and defaults to "type".

Example

T1	Chemical 0 9	Lidocaine
T2	Disease 18 34	cardiac asystole
T3	Chemical 90 99	lidocaine
T4	Disease 142 152	depression
T5	Disease 331 347	bradyarrhythmias
T6	Chemical 409 418	lidocaine

Sources

The Brat flavor of the stand-off format is documented here.

Notes

  • Document structure: The stand-off annotations point to a plain-text document by means of character offsets only. Plain text supports almost no document structuring, in particular no document boundaries.
  • Metadata: No metadata are supported (a future version of bconv might use Brat's note annotations for encoding metadata).
  • Entity annotations: Basic information of text-bound annotations is written in a T line (span offsets and (typically) entity type). With the brat format, co-located annotations are collapsed, such that only one T line is written. With the bionlp format, multiple T lines are written for co-located annotations. In addition to T lines, brat also supports N and A lines, corresponding to annotations for normalisation (concept ID) and arbitrary attributes, respectively.
  • Offsets: Offsets are counted in terms of Unicode codepoints, starting at the beginning of the document.
  • Discontinuous spans: For annotations with multiple spans, pairs of start/end offset are joined with a semicolon separator, eg. 4 7;12 16.
  • Relations/events: Two different notations are supported: R lines for binary relations with a relation type and E lines for events with an explicit event trigger. When serialising, bconv chooses R lines for relations with exactly two members and a non-empty type entry in the metadata, E lines otherwise. All members are formatted as role:ref-ID pairs. Relation metadata are ignored except for the type in R lines. Note that the Brat specs identify the first argument of events as the event trigger; bconv does not distinguish different argument types, however, and simply lists all members in definition order.

Exporters

BratFormatter

Properties

fmt brat
supports text no
supports annotations yes
stream type text

Options

name type default purpose
att str "type" Entity.metadata key of the attribute value (used in the "type" slot of a T line)
cui str None Entity.metadata key of the concept ID, if any (used in a separate N line)
extra Sequence[str] () Entity.metadata keys for additional attributes (used in a separate A line each)
avoid_gaps str None suppress discontinuous spans
avoid_overlaps str None suppress annotation collisions

BioNLPFormatter

Properties

fmt bionlp
supports text no
supports annotations yes
stream type text

Options

name type default purpose
att str "type" Entity.metadata key of the attribute value (used in the "type" slot of a T line)
avoid_gaps str None suppress discontinuous spans
avoid_overlaps str None suppress annotation collisions