Quiz 4: Midterm Exam Practice

Quiz 4: Midterm Exam Practice#


A realtor is using historical data to predict the likelihood of houses being sold easily. The dataset below provides details on the sale status of four houses:

House

Price (USD)

Area (sqft)

Easy to Sell?

1

$320k

1200

Yes

2

$520k

3000

No

3

$360k

1350

Yes

4

$720k

4200

No

  1. Descriptive Statistics: Compute the minimum, maximum, mean, median, and interquartile range (IQR) for the house prices.

  2. Normalization: Perform min-max normalization on both the house prices and areas (sqft).

  3. Prediction using Euclidean Distance: A new property (House 5) is listed with a price of $400k and an area of 2700 sqft. Predict whether House 5 will be easily sold by finding its most similar house from the historical dataset using the Euclidean distance formula.

  4. Based on your previous analysis, is House 5 easy to sell?

    • Yes

    • No


We conducted a survey of 100 attendees at our theater to ascertain their viewing and purchasing behaviors. For each attendee, we recorded the movie genre they watched and whether they bought snacks. The goal of this survey is to determine if there is a relationship between the movie genre and the likelihood of purchasing snacks. Consider the two variables: movie_genre and snack_purchase.

Snacks Purchased

No Snacks Purchased

Action

a 20

b 40

Comedy

c 30

d 10

  1. Calculate the χ² statistic for the given data.

  2. Given the following critical values Table 1 for the χ2 distribution, and selecting 0.001 as the significance level, can we infer that the variables movie genre and snack purchase are related?

    • Yes

    • No

Table 1: Upper-tail critical values of χ2 distribution with ν degrees of freedom

degree of freedom \(\nu \)

0.1

0.05

0.025

0.01

0.001

1

2.706

3.841

5.024

6.635

10.828

2

4.605

5.991

7.378

9.210

13.816

3

6.251

7.815

9.348

11.345

16.266

4

7.779

9.488

11.143

13.277

18.467


A predictive model is trained to classify animals into four categories: Cat, Dog, Rabbit, and Hamster. The true probability distribution for a specific example image (ground truth) is given, along with the model’s predicted probabilities for the same categories.

Category

True Probability (p)

Predicted Probability (q)

Cat

1

0.5

Dog

0

0.3

Rabbit

0

0.1

Hamster

0

0.1

Calculate the cross-entropy and KL divergence between the true and predicted probability distributions.

  1. Calculate the Cross-Entropy between the true distribution and the predicted distribution using the formula:

    \[H(p, q) = - \sum_i p(i) \log_2 q(i)\]
  2. Calculate the KL Divergence (Kullback-Leibler Divergence) between the true distribution (p) and the predicted distribution (q) using the formula:

    \[D_{KL}(p||q) = \sum_i p(i) \log_2 \frac{p(i)}{q(i)} \]

  1. Given dataset: [-2, 4, 8, 6, -5]. Your task is to compress the data. After applying Discrete Wavelet Transform (DWT), the data will be transferred to __


  1. You have the following dataset representing Hours Studied and Test Scores of students. Calculate Spearman’s rank correlation to determine whether there is a relationship between hours studied and test scores.

Student

Hours Studied

Test Score

1

5

82

2

3

76

3

4

88

4

2

70

5

1

60