|
INFSCI 2160 Data Mining Fall 2009 ( 2101 )
|
|
All available information about this course may be found at: http://courseweb.pitt.edu
Overview. Data mining has become an increasingly popular statistical method for knowledge discovery. With roots in classical scaling techniques and artificial intelligence, data mining provides a set of tools for the discovery and visualization of patterns in data warehouses. Data mining has been used in diverse applications from the discovery of quasars to the analysis of web logs. This course will look at various data mining tools to see how they work and what kinds of knowledge can be discovered. In addition, we will try to separate out the hype which exists in some popular accounts of data mining from the reality of the methods.
The course will provide an important foundation for further study in diverse areas, such as information retrieval, cognitive science, and marketing. The techniques discussed are also the foundation of many modern data mining techniques. The course will count as one of the two required foundations courses in the MSIS program.
Please note that a certain degree of mathematical and statistical fluency is required. It will be assumed that students have completed IS 2020, or the equivalent. If you have not taken this course, then please contact the instructor, via email, for permission to take the course.
Email. All email about the class should include "IS 2160" in the subject line to be read and processed by the instructor and/or GSA.
Materials. The primary text for the term is
In addition, we will be using the Weka 3.4 machine learning workbench available for free from the authors at http://www.cs.waikato.ac.nz/~ml/weka/index.html for Windows, Mac or Linux OS. On occassion, we might use other programs or packages, such as S-plus or SPSS during the semester. Additional links related to the class can be found at the following sites:
Evaluation. Evaluation will consist of weekly
homework assignments, a midterm exam and a final exam. There will 6
homework assignments over the course of the term, each counting 20 points.
Each homework will be posted one week before it is due. Homework
will be a consist of a combination of thought questions and
data analysis. The lowest homework grade will be dropped,
resulting a possible of 100 points for homework. Late homework will lose 1 point each day
that it is late. No homework will be
accepted more than 7 days after the due date. You may work on with others
on the homework, but the final work should be your own and in your own words.
The midterm and final exam will each be worth 100 points. Each exam will be 90 minutes long and will be held in-class. The exams will be open-book/open-notes and the final will be non-cumulative.
Special circumstances. If you have a disability for which you are or may be requesting an accomodation, you are encouraged to contact both your instructor and the Office of Disability Resources and Sevices, 216 William Pitt Union, (412-648-7890/TTY:412-383-7355) as early as possible in the term. DRS will verify your disability and determine reasonable accomodations for this course. In addition, you should be aware that my office is up a short flight of stairs. If this problematic, I am happy to arrange a meeting in an accessible location at any time.
Weekly Schedule
Introduction; 9/2
Data Mining Primitives; 9/9
Concepts, Instances & Attributes; 9/16
Knowledge Representation; 9/30
Validation Methods and Review; 10/21
Cluster Analysis I; 11/4
Cluster Analysis II; 11/11
Latent Semantic Analysis; 11/18
Applications and Review; 12/9
Final Exam;
12/16