Corpus Linguistics

Corpus Linguistics

Corpus Annotation [1]

cover

Outline

  1. Advanced corpus query
  2. Corpus Annotation
  3. Annotation tools: Basics

What you've learned so far

Concgram revisted

An interesting idea (to be argued about)

A bit more about Corpus Query

Exercise

see gitbook

Outline

  1. Advanced corpus query
  2. Corpus Annotation
  3. Annotation tools: Basics

Corpus Annotation

Corpus Annotation

annotation vs markup

Types of annotation

Drawing

Types of annotation

Drawing

Type of annotation

Corpus Annotation: Why

Annotation schemes

An annotation scheme should contain at least:

Corpus Annotation: Tools

GATE

An infrastructure for developing and deploying software components that process human language. GATE helps scientists and developers in three ways

  1. by specifiying an architecture, or organisational structure, for language processing software;
  2. by providing a framework, or class library, that implements the architecture and can be used to embed language processing capabilities in diverse applications;
  3. by providing a development environment built on top of the framework made up of convenient graphical tools for developing components

GATE

Walk through basic learning modules

Lab: Annotation Practice

Loading, Setting and Viewing

Homework (20151030)

[Ref] 1. 人民日報 2. 聯合報

Comparable corpus

A comparable corpus can be defined as a corpuscontaining components that are collected using the same sampling frameand similar balance and representativeness (cf. McEnery, 2003: 450), e.g.the same proportions of the texts of the same genres in the same domains ina range of different languages in the same sampling period.