Go to: LING2050 home page   Lab pages index   Command reference sheet

Corpus Linguistics Resources: Tools and More

Organizations

Consortiums & Associations
  • [link] European Language Resources Association (ELRA)
  • [link] Linguistic Data Consortium (LDC)
  • [link] International Computer Archive of Modern and Medieval English (ICAME)
  • [link] The Text Encoding Initiative (TEI)
  • [link] Open Language Archives Community (OLAC)
  • [link] BNC Consortium
Academic Research Centers & Project Homes
  • [link] University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University
  • [link] TalkBank
  • [link] Centre for English Corpus Linguistics. Université catholique de Louvain (Belgium)
  • [link] The English Lexicon Project at Washington University in St. Louis
  • [link] MRC Psycholinguistic Database Project
  • [link] FrameNet Project home at Berkeley

Corpus Software and Tools

General-Purpose Corpus Analysis and Search Software:
  • [link] WordSmith Tools (by Mike Scott; PC, Intel-based Mac via Crossover; license purchase required)
  • [link] MonoConc Pro (PC only; license purchase required)
  • [link] AntConc (by Laurence Anthony; PC, Mac, Linux; FREEWARE)
  • [link] Xaira (XML-based corpus search program; PC, Mac, Linux; GNU Public License)
  • [link] CorpusSearch 2 (for corpora in Penn-Treebank format; Written in Java; all platforms; Free)
  • [link] Tgrep & TGrep2
NLP Projects and Tools
  • [home][demo][tagset] CLAWS part-of-speech tagger for English
  • [link] The Penn Treebank POS Tagset
  • [link] TnT Statistical POS Tagger
  • [home][demo] UIUC Cognitive Computation Group POS Tagger
  • [home][demo] The Stanford Parser
  • [home][demo] CMU Link Grammar Parser
  • [home][book] Natural Language Toolkit
Statistical Analysis Tools:
  • R: a free software environment for statistical computing and graphics
    • [link] R Project home
    • [book][bootcamp] Quantitative Corpus Linguistics with R by Stefan Gries
    • [link] R function index
Python:
  • [link] Beginning Python: Instant Hacking by Magnus Lie Hetland
  • [home][Ch.1][Ch.2][Ch.3][Ch.4] Linguist's Guide to Python by Ron Zacharski
  • [link] Processing Corpora with Python and NLTK

Other Corpus Resource Pages on the Web

Corpus Resource Index Pages:
  • [link] Corpus Linguistics (1996) by Tony McEnery and Andrew Wilson, Sections 1--4
  • [link] Bookmarks for Corpus-based Linguists, by David Lee
  • [corpora survey][tools] Corpus Based Language Studies companion page
  • [link] Computational Methods in Linguistic Research by Bill Poser
English Word Lists and Statistics:
  • [link] BNC database and word frequency lists by Adam Kilgarriff
  • [link] Phrases in English (PIE) home: N-gram statistics based on BNC
  • [link] ESL : Vocabulary : Lists
  • [link] Word frequency lists based on: FLOB, Frown, LOB, Brown.
  • [link] Moby Lexicon Project by Grady Ward
  • [link] Kevin's Word List Page: Ispell and 12Dicts packages
  • [link] Word Lists: lists of family and first names
  • [link] Ogden's Basic English Word List
  • [link] WordNet: a large lexical database of English, at Princeton