Homework 2#

Overview#

Objectives#

In this assignment, you will:

  • Implement a simplified version of the Incremental Association Markov Blanket (IAMB) algorithm based on provided pseudocode.

  • Apply IAMB for feature selection on a sample dataset (Pima Indians Diabetes Dataset).

  • Evaluate and compare model performance with and without IAMB-based feature selection using Gaussian Naive Bayes.

About the Dataset#

The Pima Indian Diabetes Dataset, originally from the National Institute of Diabetes and Digestive and Kidney Diseases, contains data on 768 women from a population near Phoenix, Arizona. The outcome tested is Diabetes, with 258 positive and 500 negative cases. There are eight predictor variables, including:

  • Pregnancies: Number of times pregnant

  • OGTT: Oral glucose tolerance test result (two-hour plasma glucose concentration after 75g anhydrous glucose)

  • Blood Pressure: Diastolic blood pressure (mmHg)

  • Skin Thickness: Triceps skin fold thickness (mm)

  • Insulin: 2-hour serum insulin (mu U/ml)

  • BMI: Body Mass Index (kg/m²)

  • Age: Age in years

  • Pedigree Diabetes Function: Likelihood of diabetes based on family history

Access and Submission#

  1. Access the assignment through this link and start your work by clicking on the Colab notebook in the red box, as shown below.

    google_classroom

  2. Important: After completing the notebook, click “Turn in” on Google Classroom to submit your work.

Note: You must ‘Turn in’ your notebook for this homework; otherwise, it will not be graded.