Fall 2012 ( 2131 )
All available information about this course may be found at: http://courseweb.pitt.edu
Overview. Data mining has become an increasingly popular statistical method for knowledge discovery. With roots in classical scaling techniques and artificial intelligence, data mining provides a set of tools for the discovery and visualization of patterns in data warehouses. Data mining has been used in diverse applications from the discovery of quasars to the analysis of web logs. This course will look at various data mining tools to see how they work and what kinds of knowledge can be discovered. In addition, we will try to separate out the hype which exists in some popular accounts of data mining from the reality of the methods.
The course will provide an important foundation for further study in diverse areas, such as information retrieval, cognitive science, and marketing. The techniques discussed are also the foundation of many modern data mining techniques. The course will count as one of the two required foundations courses in the MSIS program.
Please note that a certain degree of mathematical and statistical fluency is required. It will be assumed that students have completed IS 2020, or the equivalent. If you have not taken this course, then please contact the instructor, via email, for permission to take the course.
Email. All email about the class should include "INFSCI 2160" in the subject line to be read and processed by the instructor and/or GSA.
In addition, we will be using the Weka 3.6 machine learning workbench available for free from the authors at http://www.cs.waikato.ac.nz/~ml/weka/ for Windows, Mac or Linux OS, as well as IBM SPSS, which costs $5 from http://technology.pitt.edu/software/for-students-software.html.
Additional links related to the class can be found at the following sites:
Evaluation. Evaluation will consist of weekly homework assignments, a midterm exam and a final exam. There will 6 homework assignments over the course of the term, each counting 20 points. Each homework will be posted one week before it is due. Homework will be a consist of a combination of thought questions and data analysis. The lowest homework grade will be dropped, resulting a possible of 100 points for homework. Late homework will lose 2 points each day that it is late. No homework will be accepted more than 4 days after the due date. You may work on with others on the homework, but the final work should be your own and in your own words.
The midterm and final exam will each be worth 100 points. Each exam will be 90 minutes long and will be held in-class. The exams will be open-book/open-notes and the final will be non-cumulative.
Special circumstances. If you have a disability for which you are or may be requesting an accomodation, you are encouraged to contact both your instructor and the Office of Disability Resources and Sevices, 216 William Pitt Union, (412-648-7890/TTY:412-383-7355) as early as possible in the term. DRS will verify your disability and determine reasonable accomodations for this course. In addition, you should be aware that my office is up a short flight of stairs. If this problematic, I am happy to arrange a meeting in an accessible location at any time.
No class meeting; 29 Aug 2012
Assignment: Please load Weka 3.6 machine learning workbench available for free from the authors at http://www.cs.waikato.ac.nz/~ml/weka/index.html for Windows, Mac or Linux OS.
Week 1: Introduction; 5 Sep 2012; - - - Drop/Add ends 7 Sep 2012
Week 2: Concepts, Instances & Attributes; 12 Sep 2012
No class meeting; 19 Sep 2012
Week 3: Knowledge Representation; 26 Sep 2012
Chapter 3; Homework #1 due
Week 4: Decision Trees; 3 Oct 2012
Sections 4.1 -4.3 and 6.1; Homework #2 due
Week 5: Association Rules; 10 Oct 2012
Sections 4.4-4.5, 6.2-6.3; Homework #3 due
Week 6: Validation Methods; 17 Oct 2012
Midterm; 24 Oct 2012
Week 7: Cluster Analysis; 31 Oct 2012
Chapter 4.8, 6.8; Reading: Milligan & Hirtle
Week 8: Latent Semantic Indexing; 7 Nov 2012
Reading: Deerwester, et al.; Homework #4 due
Week 9: Text Mining; 14 Nov 2012
Chapter 9.1-9.5; Homework #5 due
Thanksgiving break; No class meeting; 21 Nov 2012
Week 10: Web Mining; 28 Nov 2012
Chapter 9.6-9.9; Homework #6 due
Week 11: Applications and Review; 5 Dec 2012
Final Exam; 12 Dec 2012