Homework 3#

Overview#

Objectives#

In this assignment, you will implement and evaluate multiple algorithms to classify the Pima Indians Diabetes Dataset. Using 5-fold stratified cross-validation, you will calculate metrics such as Accuracy, F1 Score (weighted), and AUC-ROC (weighted). Additionally, you will perform hyperparameter tuning for certain algorithms to improve their performance and visualize the ROC curves for all classifiers. If class imbalance is identified, you will handle it using appropriate methods.

  • The target feature is Outcome, which indicates whether a patient has diabetes (1) or not (0).

About the Dataset#

The Pima Indian Diabetes Dataset, originally from the National Institute of Diabetes and Digestive and Kidney Diseases, contains data on 768 women from a population near Phoenix, Arizona. The outcome tested is Diabetes, with 258 positive and 500 negative cases. There are eight predictor variables, including:

  • Pregnancies: Number of times pregnant

  • OGTT: Oral glucose tolerance test result (two-hour plasma glucose concentration after 75g anhydrous glucose)

  • Blood Pressure: Diastolic blood pressure (mmHg)

  • Skin Thickness: Triceps skin fold thickness (mm)

  • Insulin: 2-hour serum insulin (mu U/ml)

  • BMI: Body Mass Index (kg/m²)

  • Age: Age in years

  • Pedigree Diabetes Function: Likelihood of diabetes based on family history

Access and Submission#

  1. Access the assignment through this link and start your work by clicking on the Colab notebook in the red box, as shown below.

    google_classroom

  2. Important: After completing the notebook, click “Turn in” on Google Classroom to submit your work.

Note: You must ‘Turn in’ your notebook for this homework; otherwise, it will not be graded.