Welcome#

to our Knowledge Discovery and Data mining(KDD) course. In this course, we’ll delve into the world of data mining, uncovering valuable insights from vast datasets. Explore techniques for identifying meaningful patterns, correlations, and trends, and apply them to real-world and synthetic data. Topics encompass all stages of knowledge discovery, from association rules to cluster analysis, classification, and regression. Through hands-on coding, students will implement essential data mining algorithms and use existing tools to expand their skill set in practical applications

Course Information#

Instructor: Dr. Yong Zhuang

Class Schedule#

Section 01#

  • Class Time: Tuesday 6:00 pm - 8:50 pm

  • Room: Pew Campus | DeVos Center | Room 210A

  • Midterm: October 7 (Tuesday), 6:00 pm - 7:15 pm

  • Final Exam: December 9 (Tuesday), 6:00 pm - 7:50 pm

Section 02#

  • Class Time: Monday, Wednesday 4:30 pm - 5:45 pm

  • Room: Pew Campus | DeVos Center | Room 210A

  • Midterm: October 6 (Monday), 4:30 pm - 5:45 pm

  • Final Exam: December 10 (Wednesday), 4:00 pm - 5:50 pm

Preference Books#

There is no main textbook for the class. However, you may use materials from the following books as a reference. Lecture slides and additional reading materials will be provided on the class website.

  • Data Mining Concepts and Techniques (4th Edition) by Jiawei Han, Jian Pei, and Hanghang Tong. Publication Date: 2023. (free at GVSU library)

Tentative Schedule#

  • August 31 - September 1, 2025 Labor Day Recess: No classes!

  • October 19-21, 2025 Fall Break: No classes!

  • November 26-30, 2025 Thanksgiving Recess: No classes!

  • To execute the sample Jupyter Notebook code , click on the rocket icon at the top of the page, which will open the notebook in Google Colab for interactive use.

Week

Content

Reading

1. 08/25

Syllabus
What is Data Mining: slides
Data Mining Tasks: slides
Introduction to Python: code
Quiz 1

resources

2. 09/01

Descriptive Statistics: slides | code
Data Visualization: slides
Introduction to Numpy: code
Introduction to Pandas: code
Quiz 2

resources

3. 09/08

Data Cleaning & Integration: slides | code
Data Compression & Sampling: slides | code
Data Transformation slides | code
Quiz 3

resources

4. 09/15

Similarity and Distance Measures: slides
Homework 1

resources

5. 09/22

Feature Analysis: Relationships: slides
Data Transformation II: slides

resources

6. 09/29

Midterm Topics and Practice(Quiz 4)
Non-linear relationship

resources

7. 10/06

Midterm Exam | Homework 1 & Midterm Exam Review

8. 10/13

Feature Extraction: slides | code
Feature Selection: slides | code
Markov Blanket: slides

resources

9. 10/20

Fall Break (No Class for Section 1),Homework 2, ARIMA

resources

10. 10/27

Supervised/Unsupervised Learning: slides
Decision Tree: slides

resources

11. 11/03

Bayesian Classification: slides
Ensemble Methods: slides
Classifier Evaluation, Model Selection: slides
Quiz 5

resources

12. 11/10

Clustering: slides | code
Linear, Logistic regression and Perceptron: slides
Lazy learning: slides

resources

13. 11/17

Neural Network: slides
CNN: slides

resources

14. 11/24

RNN: slides | video
Attention: slides | video
Transformer: slides | video | code

resources

15. 12/01

Project Presentation, Final Exam: topics | practice

resources

16. 12/08

Final Exam