Corpus Linguistics

Corpus Linguistics

Corpus Annotation [2]

cover

Outline

  1. Upper-level Annotation
  2. Annotation Quality

Good learning materials

Recap: Levels of Annotation

More on Levels of Annotation

More on Levels of Annotation

Units (crossing the sentence boundary) reflect the communicative function of the sentence

Topic-Focus Articulation (TFA)

TFA: Example

Drawing

Prague Dependency Treebank / [source: Schulte im Walde & Zinsmeister, 2006]

Rhetorical Structure

Ref

Rhetorical Structure: Example

Drawing

Rhetorical Structure: Example

Drawing

rstWeb - Browser Annotation of RST

Drawing

Discourse Connectives

Discourse Connectives: Subordinating conjunctions

Because [the drought reduced U.S. stockpiles], [they have more than enough storage space for their new crop], and that permits them to wait for prices to rise.

Discourse Connectives: Coordinating conjunctions

[William Gates and Paul Allen in 1975 developed an early language-housekeeper system for PCs], and [Gates became an industry billionaire six years after IBM adapted one of these versions in 1981].

Opinion annotation

Emotional Chunks (Hsieh and Lu)

Exercise with LOPEtator

Drawing

Outline

  1. Upper-level Annotation
  2. Annotation Quality

Annotation Quality

Crucial issue: are the annotations correct?

Validity vs. Reliability

(Artstein and Poesio, 2008)

Validity vs. Reliability

(Artstein and Poesio, 2008)

Cases

In all cases, measure of reliability is to calculate the coefficients of agreement.

Rare Case

In some rare cases, there exists a "correct" annotation (gold standard).

\[ Recall = \frac{Nb of correct found annotations}{Nb of correct expected annotations} \]

Rare Case

\[ Precision = \frac{Nb of correct found annotations}{Total nb of found annotations} \]

\[ F1 = 2 * \frac{P*R}{P+R} \]

What if no gold standard exists?

\(S\), \(\kappa\), and \(\pi\) measure.

Homework (20151120)