Quiz 4: Midterm Exam Practice#
A realtor is using historical data to predict the likelihood of houses being sold easily. The dataset below provides details on the sale status of four houses:
House |
Price (USD) |
Area (sqft) |
Easy to Sell? |
---|---|---|---|
1 |
$320k |
1200 |
Yes |
2 |
$520k |
3000 |
No |
3 |
$360k |
1350 |
Yes |
4 |
$720k |
4200 |
No |
Descriptive Statistics: Compute the minimum, maximum, mean, median, and interquartile range (IQR) for the house prices.
Normalization: Perform min-max normalization on both the house prices and areas (sqft).
Prediction using Euclidean Distance: A new property (House 5) is listed with a price of $400k and an area of 2700 sqft. Predict whether House 5 will be easily sold by finding its most similar house from the historical dataset using the Euclidean distance formula.
Based on your previous analysis, is House 5 easy to sell?
Yes
No
We conducted a survey of 100 attendees at our theater to ascertain their viewing and purchasing behaviors. For each attendee, we recorded the movie genre they watched and whether they bought snacks. The goal of this survey is to determine if there is a relationship between the movie genre and the likelihood of purchasing snacks. Consider the two variables: movie_genre
and snack_purchase
.
Snacks Purchased |
No Snacks Purchased |
|
---|---|---|
Action |
a 20 |
b 40 |
Comedy |
c 30 |
d 10 |
Calculate the χ² statistic for the given data.
Given the following critical values Table 1 for the χ2 distribution, and selecting 0.001 as the significance level, can we infer that the variables movie genre and snack purchase are related?
Yes
No
Table 1: Upper-tail critical values of χ2 distribution with ν degrees of freedom
degree of freedom \(\nu \) |
0.1 |
0.05 |
0.025 |
0.01 |
0.001 |
---|---|---|---|---|---|
1 |
2.706 |
3.841 |
5.024 |
6.635 |
10.828 |
2 |
4.605 |
5.991 |
7.378 |
9.210 |
13.816 |
3 |
6.251 |
7.815 |
9.348 |
11.345 |
16.266 |
4 |
7.779 |
9.488 |
11.143 |
13.277 |
18.467 |
A predictive model is trained to classify animals into four categories: Cat, Dog, Rabbit, and Hamster. The true probability distribution for a specific example image (ground truth) is given, along with the model’s predicted probabilities for the same categories.
Category |
True Probability (p) |
Predicted Probability (q) |
---|---|---|
Cat |
1 |
0.5 |
Dog |
0 |
0.3 |
Rabbit |
0 |
0.1 |
Hamster |
0 |
0.1 |
Calculate the cross-entropy and KL divergence between the true and predicted probability distributions.
Calculate the Cross-Entropy between the true distribution and the predicted distribution using the formula:
\[H(p, q) = - \sum_i p(i) \log_2 q(i)\]Calculate the KL Divergence (Kullback-Leibler Divergence) between the true distribution (p) and the predicted distribution (q) using the formula:
\[D_{KL}(p||q) = \sum_i p(i) \log_2 \frac{p(i)}{q(i)} \]
Given dataset: [-2, 4, 8, 6, -5]. Your task is to compress the data. After applying Discrete Wavelet Transform (DWT), the data will be transferred to __
You have the following dataset representing Hours Studied and Test Scores of students. Calculate Spearman’s rank correlation to determine whether there is a relationship between hours studied and test scores.
Student |
Hours Studied |
Test Score |
---|---|---|
1 |
5 |
82 |
2 |
3 |
76 |
3 |
4 |
88 |
4 |
2 |
70 |
5 |
1 |
60 |