Data and Web Mining
Instructor: Salvatore Orlando
Goals
Data Mining involves a set of techniques and methods to extract novel knowledge from large databases, to be profitably exploited by decisional processes.
Data Mining is one of the main activities in the complex process of Knowledge Discovery in Databases (KDD).
The course deals with the fundamentals of this subject, by focusing on the most important algorithmic techniques.
Moreover, the course uses as case of study the Web, and the chances of extracting useful knowledge by mining the hyperlink structure of the Web, its contents and the usage logs.
Contents
- Introduction to Data Mining. Concepts and overview of the KDD process. Applications.
- Data Mining techniques and algorithms: clustering, classification, association rules.
- Data cleaning and visualization.
- Web mining: mining of contents, hyperlink structure, and the usage logs.
- Web Search
Recommended Reading List
- Lecture slides and notes
- P.-N. Tan, M. Steinbach, V. Kumar. Introduction to Data Mining. Pearson Addison-Wesley.
- Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer-Verlag, 2006.
Other books
- J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann - 2001.
- M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall.
- Toby Segaran. Programming Collective Intelligence: Building Smart Web 2.0 Applications. O'Reilly, 2007.
Assessments
Written exam (60%), talk on a scholarly paper (40%).
For an example of written exam, click here
Teaching Methods
Class lectures. Discussion on scholarly papers.
Mailing list
Students can subscribe (unsubscribe) to the mailing list of the course:
http://listserver.dsi.unive.it/wws/subrequest/datamining. After the subscription, students can send email to the following address: datamining@dsi.unive.it. A confirmation email is requested to complete the message delivery.
Slides
- Introduction
- Data
- Association Mining
- Association Mining - 2
- Classification
- Classification: Alternative techniques
- Clustering
- Web Search e Information Retrieval
- Link Analysis
- Web Usage Mining
Student Seminars
The list of papers available to prepare the seminars is the following:
Before preparing your talk (max 20 min), look at the suggested method(s) to read scientific papers well illustrated in these two documents: doc1 and doc2.
Be sure to point out the following items in your talk: (1) General/Specific subject of the paper, (2) Hypothesis, (3) Research Methodology, (4) Results, and finally (5) Summary of key points.
Written exams
MSc Thesis Topics (Under Construction)