Homework 2#
Overview#
Objectives#
In this assignment, you will:
Implement a simplified version of the Incremental Association Markov Blanket (IAMB) algorithm based on provided pseudocode.
Apply IAMB for feature selection on a sample dataset (Pima Indians Diabetes Dataset).
Evaluate and compare model performance with and without IAMB-based feature selection using Gaussian Naive Bayes.
About the Dataset#
The Pima Indian Diabetes Dataset, originally from the National Institute of Diabetes and Digestive and Kidney Diseases, contains data on 768 women from a population near Phoenix, Arizona. The outcome tested is Diabetes, with 258 positive and 500 negative cases. There are eight predictor variables, including:
Pregnancies: Number of times pregnant
OGTT: Oral glucose tolerance test result (two-hour plasma glucose concentration after 75g anhydrous glucose)
Blood Pressure: Diastolic blood pressure (mmHg)
Skin Thickness: Triceps skin fold thickness (mm)
Insulin: 2-hour serum insulin (mu U/ml)
BMI: Body Mass Index (kg/m²)
Age: Age in years
Pedigree Diabetes Function: Likelihood of diabetes based on family history
Access and Submission#
Access the assignment through this link and start your work by clicking on the Colab notebook in the red box, as shown below.
Important: After completing the notebook, click “Turn in” on Google Classroom to submit your work.
Note: You must ‘Turn in’ your notebook for this homework; otherwise, it will not be graded.