Go to: LING2050 home page   Lab pages index   Command reference sheet

Lab 6

Objectives: searching corpora with AntConc; using regular expressions
Reference:

Overview

AntConc is a corpus search and concordancer program that is FREELY available for three OS platforms: Windows, Mac, and Linux. It offers many corpus-processing functionalities, including concordancing, collocation search, keyword comparisons of multiple corpora, and generating lists such as word frequency lists, keyword lists, and n-grams. It also accepts regular expressions in search queries.

AntConc

  1. What is AntConc?

  2. What can we do with AntConc?
    - Conduct corpus-wide basic statistical analyses:
    • Total # of word tokens and types (6)
    • A word frequency list (6)
    • A keyword list (7) (A reference corpus needs to be provided)
    • N-gram lists (4) (Panel toggles between N-grams and Clusters; check radio box "N-Grams")
    - For a particular word or phrase, find:
    • Concordance lines (1)
      Concordance plot (2)
      File view: shows the word in its original context (3)
    • Clusters (4)
    • Collocates (5)

    - Utilizing a lemma list: use Someya Lemma List, "no hyphenated word" version to fold inflected varieties into a single lemma (e.g., try, tries, trying -> TRY)
    • Tool preferences -> word list -> Lemma list options. Load lemma file, and don't forget to press "LOAD" button

Processing Gutenberg Corpus with AntConc

  1. Load our Gutenberg corpus into AntConc. Process the corpus for the following:
    • Total # of word tokens and types
    • A word frequency list
    • A keyword list (NOTE: use "abc corpus" as the reference corpus)
    • 2-gram, 3-gram, 4-gram lists (NOTE: set Min. Cluster Frequency for a large job!)
  2. Now we will try exploring the corpus based on particular search terms.
    • What are concordances for sister? How about sister(s)? What does Concordance Plot look like?
    • Look up concordances for: HAVE been, HAVE never been, HAVE ever been. (HAVE indicates a lemma form: it covers have, has, had, having.)
    • Look up concordances for: it BE ... that, it BE ... to
    • Look up concordances for: a BE verb followed by an adverb, followed by a past participle verb (NOTE: There is no way to precisely specify POS; we will need to make-do based on common POS suffixes.)
    • Look up frequent clusters involving thou, minimum size of 2 and maximum size of 4.
    • Look up frequent clusters involving HAVE, minimum size of 2 and maximum size of 4.
    • Look up frequent clusters involving HAVE to, minimum size of 3 and maximum size of 4.
    • What are top collocates of shalt? Look at 2 words left and right. (NOTE: set Min. Collocate Frequency for a large processing job!)
    • What are top collocates of husband? First try it with "Sort by Freq", and then "Sort by Stat". For the latter option to work, "Collocation measure" option needs to be set under Tool Preferences -> Collocates. (WARNING: Calculating MI score can take a while, especially on a large corpus!)