Na-Rae Han's home page
Took the course in the past? Click HERE for 2013, HERE for 2014, and HERE for 2016. (E-mail Na-Rae for password.)

LING 1330/2330 Introduction to Computational Linguistics

Spring 2017, University of Pittsburgh

Meetings: Tue & Thu 11am - 12:15pm   Classroom: 5405 Posvar Hall
Description:
This is a course designed to introduce students who have been exposed to linguistics to real-world applications of computational linguistics. The students will first learn the fundamentals of how computers are used to represent and process textual and spoken information. They will then be introduced to the challenges of real-world language engineering problems and learn how they are handled with the latest language technologies. The topics include: spell-checking, machine translation, part-of-speech tagging, parsing, document classification, and corpus building and exploration. Students will be given hands-on training on the basics of text processing using Python and will have a chance to work with NLTK, a popular natural language processing application suite. This course is designed specifically for students in the humanities; computer science majors (who are not linguists) are encouraged to take CS 1671 or CS 1571 instead.

Prerequisites:

Notes for future students


This course will be offered again in Fall 2018. After that, it is expected to take place every fall going forward. Interested students are encouraged to learn Python in the meantime -- please see below.

LING 1000 Introduction to Linguistics is the only prerequisite for this course. Prior knowledge of Python or other programming languages is not required but highly recommended. CS 0008 "Introduction to Computer Programming with Python" or CS 0155 "Data Witchcraft" will give you a good preparation.

Students with no prior programming experience have taken this course with great success, but they typically put in more hours than their experienced peers (which is understandable). For this reason, students who are new to programming are encouraged to sign up for an additional 1-credit recitation. It will be run as LING 1901 Independent Study with the instructor, on a S/NC basis. 3 recitation sessions are being planned, all on Friday: (1) 10am -- 11:15am, (2) 11:30am -- 12:45pm, (3) 1pm -- 2:15pm. Please leave one of these slots open in your schedule.

Students are required to bring your own laptop to class. It should be running one of the following operating systems: Windows 10 (7 & 8 are also fine), Mac OS-X, and Linux (any distribution). Mobile or cloud-based machines such as Android/Apple tablets or Chromebooks are not suited.

Instructors:
Na-Rae Han           Pitt ID & Google ID: naraehan.
                    Office hours: Mon 3-5pm, Wed 3-4pm and by appointment, CL G17.
Reed Armstrong (TA)   Pitt ID is rma42.
                  "Generally available" hours (please email beforehand): Mon 1-3pm and Wed 1-3pm.
                  At Cup and Chaucer (ground floor cafe) in Hillman Library.

Textbooks:
[1] Language and Computers. Markus Dickinson et al. Wiley-Blackwell. 2012.
[2] Python tutorial: Python 3 Notes
[3] Natural Language Processing with Python. (updated edition based on Python 3 and NLTK 3) Steven Bird et al. O'Reilly Media.

Course Organization:
Each meeting will comprise two parts: lecture and lab. In the first half of the class, topics presented in the textbook [1] Language and Computers will be covered in a lecture-and-discussion format. In the second half, students will get hands-on training on the basics of text processing using Python and Natural Language Toolkit (NLTK). Friday recitations (optional) will focus on the programming aspect: additional Python exercises, upcoming homework reviews, and individual help will be offered.

Assignment Schedule

  1. As a rule, there will always be a form of assignment between classes. There are two types: homework assignments and programming exercises. They are administered via CourweWeb and due before the beginning of the next class.
  2. Homework Assignments (40-60 points): These are assigned on most Thursdays. There will be 10 or 11. These will comprise questions on lecture topics as well as programming problems.
  3. Programming Exercises (20 points): They are designed to help you learn and practice the programming aspect of the course. As long as you are keeping up with the course contents, you should be able to complete them in 1-2 hours, possibly more if you are new to programming. These will be given when there is no homework assignment, mostly on Tuesdays.
  4. Readings and Previews: In addition to the homework/exercise assignment, you will have book chapters and Python tutorials for the upcoming class to study beforehand.
  5. I will make every effort to post new assignments one week in advance. However, I might need to make some adjustments depending on the progress we make in classroom. Therefore, non-immediate assignments should be considered a DRAFT until it is finalized, which will happen by 1pm post-class.
  6. Detailed assignment schedule is found on the Class Schedule page.

Exams, Requirements, Grading and Policies
Please read the Course Policies page.