-
Notifications
You must be signed in to change notification settings - Fork 3
Brat
Lenz Furrer edited this page May 24, 2021
·
9 revisions
Brat is browser-based tool for creating and viewing annotations over plain-text documents. It uses a stand-off representation for the annotations, which is a mixture of tab- and space-separated values, and which extends a format introduced at a BioNLP shared task.
bconv
supports two variants: brat
(tailored to the restrictions of the annotation tool) and the simpler bionlp
version.
Both variants support an att
parameter which refers to the annotation attribute of the "type" slot in a T line.
It defines a key in Entity.metadata
and defaults to "type"
.
T1 Chemical 0 9 Lidocaine
T2 Disease 18 34 cardiac asystole
T3 Chemical 90 99 lidocaine
T4 Disease 142 152 depression
T5 Disease 331 347 bradyarrhythmias
T6 Chemical 409 418 lidocaine
The Brat flavor of the stand-off format is documented here.
- Document structure: The stand-off annotations point to a plain-text document by means of character offsets only. Plain text supports almost no document structuring, in particular no document boundaries.
-
Metadata: No metadata are supported (a future version of
bconv
might use Brat's note annotations for encoding metadata). -
Entity annotations: Basic information of text-bound annotations is written in a T line (span offsets and (typically) entity type).
With the
brat
format, co-located annotations are collapsed, such that only one T line is written. With thebionlp
format, multiple T lines are written for co-located annotations. In addition to T lines,brat
also supports N and A lines, corresponding to annotations for normalisation (concept ID) and arbitrary attributes, respectively. - Offsets: Offsets are counted in terms of Unicode codepoints, starting at the beginning of the document.
-
Discontinuous spans: For annotations with multiple spans, pairs of start/end offset are joined with a semicolon separator, eg.
4 7;12 16
. -
Relations/events: Two different notations are supported:
R lines for binary relations with a relation type and E lines for events with an explicit event trigger.
When serialising,
bconv
chooses R lines for relations with exactly two members and a non-emptytype
entry in the metadata, E lines otherwise. All members are formatted asrole:ref-ID
pairs. Relation metadata are ignored except for thetype
in R lines. Note that the Brat specs identify the first argument of events as the event trigger;bconv
does not distinguish different argument types, however, and simply lists all members in definition order.
fmt | brat |
---|---|
supports text | no |
supports annotations | yes |
stream type | text |
name | type | default | purpose |
---|---|---|---|
att | str | "type" |
Entity.metadata key of the attribute value (used in the "type" slot of a T line) |
cui | str | None |
Entity.metadata key of the concept ID, if any (used in a separate N line) |
extra | Sequence[str] | () |
Entity.metadata keys for additional attributes (used in a separate A line each) |
avoid_gaps | str | None |
suppress discontinuous spans |
avoid_overlaps | str | None |
suppress annotation collisions |
fmt | bionlp |
---|---|
supports text | no |
supports annotations | yes |
stream type | text |
name | type | default | purpose |
---|---|---|---|
att | str | "type" |
Entity.metadata key of the attribute value (used in the "type" slot of a T line) |
avoid_gaps | str | None |
suppress discontinuous spans |
avoid_overlaps | str | None |
suppress annotation collisions |