Corpus Linguistics

Lecturer: Shu-Kai Hsieh shukaihsieh@ntu.edu.tw

Graduate Institute of Linguistics, National Taiwan University

Edit Page

Empirical methods have revolutionized nearly all subfields of linguistics in the recent years. Since the 1990s, language resources, in particular the corpus, have played a vital role in virtually all empirically-oriented linguistic studies.

This course offers an introduction of corpus linguistics for linguistic graduate students, including the necessary tools, techniques, and analyzing methodologies for doing corpus-based studies and corpus annotation projects. Existing major corpora will be scrutinized for a better understanding of their linguistic uses. To keep students up to date with the latest developments in corpus linguistics, rationals and methods for using web as corpus will be the main focus as well. In the lab session, this course will cover the fundamentals of computer programming skills in Python/NLTK, a very popular programming language and modules for many corpus linguistic applications. This course will be taught in English and assumes only minimal background with computers, no programming skills or knowledge are required.

Goals of this course include:

Schedule

Syllabus.pdf

Week Date Topic Lab
1 09/18 Orientation
2 09/25 Introduction to Corpus Linguistics BYU practice
3 10/02 Corpus-based analytical tools Antconc Family
4 10/09 National Holiday (cancelled)
5 10/16 Corpus-based analytical tools Word Sketch Engine
6 10/23 Corpus annotation (I) Word Sketch Engine
7 10/30 Corpus annotation (II) Annotation exercise
8 11/06 Corpus-based analysis Annotation exercise
9 11/13 Corpus data collection Corpus Query Languages
10 11/20 Corpus data preprocessing Corpus Query Languages
11 11/27 Corpus data preprocessing WAC: bootCAT toolkit / Lopotator
12 12/04 Basic corpus statistics (I) WAC: pre-processing
13 12/11 Basic corpus statistics (II)* WAC: pre-processing
14 12/18 Corpus data and Brain data WAC: index and search
15 12/25 Methodological issues in corpus linguistics: Evaluation and comparison WAC: index and search
16 01/01 National Holiday (cancelled)
17 01/08 Class project workshop
18 01/14 term paper due

* term project proposal

[Teaching assistants] 
Chan-Chia Hsu (Mike) <chanchiah@gmail.com>
Yu-Yun Chang (Taco) <yuyun.unita@gmail.com>

[Course webpage] Course web page / lecture scripts will be made publicly 
available at <http://lope.linguistics.ntu.edu.tw/courses/corpusling2015> 
and <http://loperntu.github.io/corpling>

Lecture scripts

Class reader

presentation order