Quiz 4: Midterm Exam Practice

Quiz 4: Midterm Exam Practice#

A realtor is using historical data to predict the likelihood of houses being sold easily. The dataset below provides details on the sale status of four houses:

House	Price (USD)	Area (sqft)	Easy to Sell?
1	$320k	1200	Yes
2	$520k	3000	No
3	$360k	1350	Yes
4	$720k	4200	No

Descriptive Statistics: Compute the minimum, maximum, mean, median, and interquartile range (IQR) for the house prices.
Normalization: Perform min-max normalization on both the house prices and areas (sqft).
Prediction using Euclidean Distance: A new property (House 5) is listed with a price of $400k and an area of 2700 sqft. Predict whether House 5 will be easily sold by finding its most similar house from the historical dataset using the Euclidean distance formula.
Based on your previous analysis, is House 5 easy to sell?
- Yes
- No

We conducted a survey of 100 attendees at our theater to ascertain their viewing and purchasing behaviors. For each attendee, we recorded the movie genre they watched and whether they bought snacks. The goal of this survey is to determine if there is a relationship between the movie genre and the likelihood of purchasing snacks. Consider the two variables: movie_genre and snack_purchase.

	Snacks Purchased	No Snacks Purchased
Action	a 20	b 40
Comedy	c 30	d 10

Calculate the χ² statistic for the given data.
Given the following critical values Table 1 for the χ2 distribution, and selecting 0.001 as the significance level, can we infer that the variables movie genre and snack purchase are related?
- Yes
- No

Table 1: Upper-tail critical values of χ2 distribution with ν degrees of freedom

degree of freedom $\nu $	0.1	0.05	0.025	0.01	0.001
1	2.706	3.841	5.024	6.635	10.828
2	4.605	5.991	7.378	9.210	13.816
3	6.251	7.815	9.348	11.345	16.266
4	7.779	9.488	11.143	13.277	18.467

A predictive model is trained to classify animals into four categories: Cat, Dog, Rabbit, and Hamster. The true probability distribution for a specific example image (ground truth) is given, along with the model’s predicted probabilities for the same categories.

Category	True Probability (p)	Predicted Probability (q)
Cat	1	0.5
Dog	0	0.3
Rabbit	0	0.1
Hamster	0	0.1

Calculate the cross-entropy and KL divergence between the true and predicted probability distributions.

Calculate the Cross-Entropy between the true distribution and the predicted distribution using the formula:

\[H(p, q) = - \sum_i p(i) \log_2 q(i)\]
Calculate the KL Divergence (Kullback-Leibler Divergence) between the true distribution (p) and the predicted distribution (q) using the formula:

\[D_{KL}(p||q) = \sum_i p(i) \log_2 \frac{p(i)}{q(i)} \]

Given dataset: [-2, 4, 8, 6, -5]. Your task is to compress the data. After applying Discrete Wavelet Transform (DWT), the data will be transferred to __

You have the following dataset representing Hours Studied and Test Scores of students. Calculate Spearman’s rank correlation to determine whether there is a relationship between hours studied and test scores.

Student	Hours Studied	Test Score
1	5	82
2	3	76
3	4	88
4	2	70
5	1	60