Lecture Notes 📜
Here is a preliminary outline of the module structure. I will develop most of the ideas on the blackboard.
You are encouraged to take notes during the lectures.
Topic | Resources |
---|---|
Principal Component Analysis (PCA) |
PCA [lec notes]
PCA [slides] Demonstration of PCA and SOM (self-organizing map) [slides] Covariance Matrix Example [notes] Boston Housing Dataset Demo (MATLAB codes) [zip] ----- PCA Quiz Questions [pdf] Quiz Answers [pdf] |
Document Mining |
Doc Mining [lec notes]
----- Doc Mining Quiz Questions [pdf] Quiz Answers [pdf] |
Clustering, Topographic Maps |
Clustering, Topographic Maps [lec notes]
Topographic Maps of Vectorial Data [slides] |
Classification |
Classification [lec notes]
Density Modeling [slides] ----- SVM Tutorial [pdf] Support Vector Machines (MIT OpenCourseWare) [video] ----- Logistic Regression Tutorial [pdf] Logistic Regression, An Introduction [video] ----- Perceptron [wiki] Perceptron [demo] |
PageRank |
PageRank [lec notes]
PageRank [slides] |
Suggested Reading 📖
-
Introductory probability theory and statistics
- Notes on Probability, Statistics and Stochastic Processes by Cosma Shalizi
- See also The Matrix Cookbook by Kaare Brandt Petersen and Michael Syskind Pedersen
- Notes on Probability, Statistics and Stochastic Processes by Cosma Shalizi
-
Principal Component Analysis
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction: Section 14.5
- Pattern Recognition and Machine Learning: Section 12.1
- Principles of Data Mining: Section 3.6
- tutorial by Lindsay Smith: soft introduction to PCA with elementary vector and matrix algebras
-
Text/Document Mining, Latent Semantic Analysis (LSA)
- Principles of Data Mining: Section 5.3.3, 14.3
- Web page devoted to LSI
- Latent Semantic Analysis on Wikipedia
- S. Deerwester et al.: Indexing by latent semantic analysis. Journal of the American Society for Information Science, 6(41), pp. 391-407. 1999.
- F.R. Lopez, H. Jimenez-Salazar, D. Pinto: A Competitive Term Selection Method for Information Retrieval. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science Vol 4394, pp. 468-475. Springer, 2007.
- T. Hofmann: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 1-2(42), pp. 177-196. 2001.
- Y. Gong, X. Liu: Generic text summarization using relevance measure and latent semantic analysis. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19-25. 2001.
-
Clustering, SOM
- The Elements of Statistical Learning: Section 14.3.4, 14.3.6, 14.3.9, 14.4
- Principles of Data Mining: Section 9.3, 9.4
-
Classification
- The Elements of Statistical Learning: Section 2.3.2, 2.4, 2.6.2, 2.9
- Pattern Recognition and Machine Learning: Section 4.1.1-3
- Principles of Data Mining: Section 10
-
Searching the Web
- M. Bianchini, M. Gori, F. Scarselli: Inside PageRank. ACM Transactions on Internet Technology, 1(5), pp. 92-128. 2005.
- T.H. Haveliwala: Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Transactions on Knowledge and Data Engineering, 4(15), pp. 784-796. 2003.
- S. Kamvar et al.: Extrapolation methods for accelerating PageRank computations. Proceedings of the 12th international conference on World Wide Web, pp. 261-270. 2003.
- S. Kamvar et al.: Exploiting the Block Structure of the Web for Computing PageRank. Technical report, Stanford University, 2003.
- A. Broder et al.: Efficient PageRank approximation via graph aggregation. Information Retrieval, 2(9), pp. 123-138. 2006.
Demonstrations ⚗️
Assignment 📝
Recommended Books 📚
Title | Author(s) | Publisher, Date | Comments | Link |
---|---|---|---|---|
The Elements of Statistical Learning: Data Mining, Inference, and Prediction | T. Hastile, R. Tibshirani, J. Friedman | Springer, 2009 | Comprehensive and cover many state-of-the-art statistical learning techniques and very helpful to understand the essence of Data Mining. Highly recommended for mathematically minded students. | Springer link |
Principles of Data Mining | D.J. Hand, H. Mannila, P. Smyth | MIT Press, 2003 | A nice gentle introduction to many areas of Data Mining. | |
Pattern Recognition and Machine Learning | Christopher Bishop | Springer, 2006 | You may need some sections of this book, particularly those on linear techniques (such as PCA) and generalisation. | SUSTech library |
Last updated: 2018/07/24 (Marked with green background)