Na-Rae Han's home page
Took the course in the past? Click here for 2017, for 2016, 2014, and 2013. (E-mail Na-Rae for password.)

LING 1330/2330 Computational Linguistics

Fall 2018, University of Pittsburgh

Meetings: Tue & Thu 4:30pm - 5:45pm   Classroom: 363 Cathedral of Learning

Description

This is a course designed to introduce students who have been exposed to linguistics to real-world applications of computational linguistics. The students will first learn the fundamentals of how computers are used to represent and process textual and spoken information. They will then be introduced to the challenges of real-world language engineering problems and learn how they are handled with the latest language technologies. The topics include: spell-checking, machine translation, part-of-speech tagging, parsing, document classification, and corpus building and exploration. Students will be given hands-on training on the basics of text processing using Python and will have a chance to work with NLTK, a popular natural language processing application suite. This course is designed specifically for students in the humanities; computer science majors (who are not linguists) are encouraged to take CS 1671 or CS 1571 instead.

Prerequisites

LING 1000 Introduction to Linguistics is the only prerequisite for this course. Prior knowledge of Python or other programming languages is not required but highly recommended. CS 0008 "Introduction to Computer Programming with Python" or the brand-new course CS 0012 "Introduction to Computing for the Humanities" will give you a good preparation.


   Attention: Future Students (Fall 2019)


Starting Fall 2019, the course will newly require CS 8 as a prerequisite. The prerequisites therefore are:
  1. LING 1000 "Introduction to Linguistics" and
  2. CS 0008 "Introduction to Computer Programming with Python" (grade B or above)
Having Python programming as a prerequisite will allow us to explore more computational linguistic topics and in a less rushed manner. Linguistics majors and grad students will very much remain as the target audience of this course: as a matter of fact, not having to learn Python will free up valuable class time to focus more on linguistic motivations.

Some additional considerations:

Substitution: CS 8 can be substituted with CS 0012 "Introduction to Computing for the Humanities" or any similar CS course that uses Python as the programming language of instruction.

Students with formal CS training (majors, minors, etc.) can have the CS 8 prereq waived with the following qualifications:

  1. CS 401 "Intermediate Programming Using Java" (grade B or above)
  2. Proof of basic Python competency (Python code samples due at enrollment, or proof of online course completion due one week before the semester's start)

Online course substitution: Under limited circumstances, the instructor may allow a sequence of two Coursera courses as a CS 8 stand-in: Getting Started with Python and Python Data Structures. They are offered year-round and require paid subscription; proof of successful completion must be submitted. Note that this option is not meant for students who have never taken a college-level programming course.

Please send any questions/inquiries to naraehan@pitt.edu.

Students are required to bring their own laptop to class. It should be running one of the following operating systems: Windows 10 (7 & 8 are also fine), Mac OS-X, and Linux (any distribution). Mobile or cloud-based machines such as Android/Apple tablets or Chromebooks are not suited.

Instructors

WhoPitt emailOffice hoursLocation
Na-Rae HannaraehanMon 11am-1pm & Wed 11am-noonG17 CL
Katherine Kairis (TA)kak275Mon & Wed 3-5pm2832 CL (linguistics grad/undergrad office)
Daniel Zheng (TA)daniel.zhengMon & Wed 7-9pm2832 CL (linguistics grad/undergrad office)
**Note: We are also available to meet by appointment.

Textbooks

[1] Language and Computers. Markus Dickinson et al. Wiley-Blackwell. 2012.
[2] Python tutorial: Python 3 Notes
[3] Natural Language Processing with Python. (updated edition based on Python 3 and NLTK 3) Steven Bird et al. O'Reilly Media.

Course Organization

Each meeting will comprise two parts: lecture and lab. In the first half of the class, topics presented in the textbook [1] Language and Computers will be covered in a lecture-and-discussion format. In the second half, students will get hands-on training on the basics of text processing using Python and Natural Language Toolkit (NLTK). Friday recitations (optional) will focus on the programming aspect: additional Python exercises, upcoming homework reviews, and individual help will be offered.

Assignment Schedule


  1. As a rule, there will always be a form of assignment between classes. There are two types: homework assignments and programming exercises. They are administered via CourweWeb and due before the beginning of the next class.
  2. Homework Assignments (40-60 points): These are assigned on most Thursdays. There will be around 11. These will comprise questions on lecture topics as well as programming problems.
  3. Programming Exercises (20 points): They are designed to help you learn and practice the programming aspect of the course. As long as you are keeping up with the course contents, you should be able to complete them in 1-2 hours, possibly more if you are new to programming. These will be given when there is no homework assignment, mostly on Tuesdays.
  4. Readings and Previews: In addition to the homework/exercise assignment, you will have book chapters and Python tutorials for the upcoming class to study beforehand.
  5. I will make every effort to post new assignments one week in advance. However, I might need to make some adjustments depending on the progress we make in classroom. Therefore, non-immediate assignments should be considered a DRAFT until it is finalized, which will happen 30 minutes post-class.
  6. Detailed assignment schedule is found on the Class Schedule page.

Exams, Requirements, Grading and Policies

Please read the Course Policies page.