Exam Review Exercises#
Consider a dataset D containing two classes, A and B, with the following distribution:
40 instances belong to class A
60 instances belong to class B.
Calculate the Gini impurity for the dataset D:
\[Gini(D) = ... \]Given the following dataset with two categorical attributes (Weather and Time of Day) and a class label, use the Naïve Bayes classifier to predict the class label of a new sample (Weather = Sunny, Time of Day = Evening).
Weather
Time of Day
Class Label (Go Outside)
Sunny
Morning
Yes
Rainy
Afternoon
No
Cloudy
Evening
Yes
Sunny
Evening
No
Cloudy
Morning
Yes
Rainy
Night
No
Sunny
Afternoon
Yes
Given the following confusion matrix, calculate Precision, Recall, and F1 Score.
Predicted Positive
Predicted Negative
Actual Positive
20
5
Actual Negative
10
30
A probabilistic classifier has been applied to a test set of 10 tuples. Below are the probability values of these tuples belonging to the positive class, sorted in decreasing order. Based on these probabilities and the actual class labels, calculate the True Positive Rate (TPR) and False Positive Rate (FPR) at each threshold, and then sketch the ROC curve.
Probability
Actual Class
0.90
Positive
0.85
Positive
0.80
Negative
0.75
Positive
0.70
Negative
0.65
Negative
0.60
Positive
0.55
Negative
0.50
Positive
0.45
Negative
Given a dataset with 5 tuples {A, B, C, D, E} and their distance matrix as shown below, perform two rounds of Agglomerative Clustering.
A
B
C
D
E
A
0.00
2.24
4.12
7.07
7.00
B
2.24
0.00
3.16
5.00
6.32
C
4.12
3.16
0.00
4.12
3.26
D
7.07
5.00
4.12
0.00
5.39
E
7.00
6.32
3.26
5.39
0.00
Round 1: Merge the closest pair A and B into a cluster.
Round 2: Apply Single Linkage, Average Linkage, and Complete Linkage to determine the next merge. For each linkage method, identify the next cluster merge and explain how the choice of linkage affects the clustering result.