Welcome#

to our Knowledge Discovery and Data mining(KDD) course. In this course, we’ll delve into the world of data mining, uncovering valuable insights from vast datasets. Explore techniques for identifying meaningful patterns, correlations, and trends, and apply them to real-world and synthetic data. Topics encompass all stages of knowledge discovery, from association rules to cluster analysis, classification, and regression. Through hands-on coding, students will implement essential data mining algorithms and use existing tools to expand their skill set in practical applications

Course Information#

Instructor: Dr. Yong Zhuang

Class Schedule#

Section 01#

  • Class Time: Tuesday 6:00 pm - 8:50 pm

  • Room: Pew Campus | DeVos Center | Room 210A

  • Midterm: October 7 (Tuesday), 6:00 pm - 7:15 pm

  • Final Exam: December 9 (Tuesday), 6:00 pm - 7:50 pm

Section 02#

  • Class Time: Monday, Wednesday 4:30 pm - 5:45 pm

  • Room: Pew Campus | DeVos Center | Room 210A

  • Midterm: October 6 (Monday), 4:30 pm - 5:45 pm

  • Final Exam: December 10 (Wednesday), 4:00 pm - 5:50 pm

Preference Books#

There is no main textbook for the class. However, you may use materials from the following books as a reference. Lecture slides and additional reading materials will be provided on the class website.

  • Data Mining Concepts and Techniques (4th Edition) by Jiawei Han, Jian Pei, and Hanghang Tong. Publication Date: 2023. (free at GVSU library)

Tentative Schedule#

  • August 31 - September 1, 2025 Labor Day Recess: No classes!

  • October 19-21, 2025 Fall Break: No classes!

  • November 26-30, 2025 Thanksgiving Recess: No classes!

  • To execute the sample Jupyter Notebook code , click on the rocket icon at the top of the page, which will open the notebook in Google Colab for interactive use.

Week

Content

Reading

1. 08/25

Syllabus
What is Data Mining: slides
Data Mining Tasks: slides
Introduction to Python: code
Quiz 1

resources

2. 09/01

Descriptive Statistics: slides | code
Data Visualization: slides
Introduction to Numpy: code
Introduction to Pandas: code
Quiz 2

resources

3. 09/08

Data Cleaning & Integration: slides | code
Data Compression & Sampling: slides | code
Data Transformation slides | code
Quiz 3

resources

4. 09/15

Similarity and Distance Measures: slides
Homework 1

resources

5. 09/22

Feature Analysis: Relationships: slides
Data Transformation II: slides

resources

6. 09/29

Midterm Topics and Practice
Non-linear relationship

resources

7. 10/06

Midterm Exam

8. 10/13

Feature Extraction, Feature Selection, Markov Blanket

resources

9. 10/20

Fall Break (No Class for Section 1), TBD for section 2

resources

10. 10/27

Decision Tree

resources

11. 11/03

Classifier Evaluation, Model Selection, Bayesian Classification

resources

12. 11/10

Linear/Logistic Regression, Perceptron, Lazy Learning, Clustering

resources

13. 11/17

Neural Network, CNN

resources

14. 11/24

RNN, Attention, Transformer

resources

15. 12/01

Project Presentation, Final Exam Topics and Practice

resources

16. 12/08

Final Exam