Skip to content

Latest commit

 

History

History
145 lines (65 loc) · 5.45 KB

10. Analysis of Categorical Data.md

File metadata and controls

145 lines (65 loc) · 5.45 KB

Quizz

  1. Questions (a)-(d) below relate to the following: Some people suspect that child births may not be equally distributed over the seven days of the week because hospital staff (who can influence the time of delivery in some cases) may prefer to work on certain days of the week. Question (a): Which of the following is the null hypothesis?
  • child births occur equally likely on the seven days of the week

  • child births are more likely on certain days of the week

This is the "nothing extraordinary is going on" hypothesis.


  1. To investigate, you note the day of the week of 300 births that were randomly selected from all births that occurred in New York City last year. Question (b): What test should you use to test the null hypothesis?
  • z-test

  • chi-square test for goodness-of-fit

  • chi-square test of independence

  • chi-square test of homogeneity

This test is for evaluating how well the observed counts of a categorical variable for a given sample conform to (i.e., fit) the expected counts of the variable. In our case, we want to test how well the counts of births associated to each day of the week conforms to the assumption that births are equally likely on each day ofthe week, so this test is appropriate for testing our null hypothesis


  1. Question (c): What is the degrees of freedom for the test from Question (b)?

6, One less than the number of categories; that is, one less than the number of possible values of the categorical variable, which in our case is "days of the week".


  1. Question (d): What would be the answer to Question (b) if you wanted to investigate a simpler question, namely whether the percentage of births on weekends is lower than expected?
  • z-test

  • chi-square test for goodness-of-fit

  • chi-square test of independence

  • chi-square test of homogeneity

The zz-test is for evaluating how unusual an observed value of a numerical variable is relative to its expected value. Our case concerns a percentage, so this test is appropriate for testing the null hypothesis.


  1. This question and the next one are related to the following context: A food delivery start-up decides to advertise its service by placing ads on web pages. They wonder whether the percentage of viewers who click on the ad changes depending on how often the viewers were shown the ad. They randomly select 100 viewers from among those who were shown the add once, 135 from among those who were shown the add twice, and 150 from among those who were shown the ad three times. Which is the null hypothesis?
  • the chances that the user clicks on the ad increases with the number of ads shown

  • the chances that the user clicks on the ad is the same for all three groups

This is the "nothing extraordinary is going on" hypothesis.


  1. In the previous question, which test is appropriate to test the null hypothesis?
  • z-test

  • chi-square test for goodness-of-fit

  • chi-square test of independence

  • chi-square test of homogeneity

This test is for evaluating whether a categorical variable measured on several samples has the same distribution in each of the samples. Our case concerns the distribution of the categorical variable "clicked on ad" in each of three samples, so this test is appropriate for testing the null hypothesis.


  1. A county wants to check whether the racial composition of the teachers in the county corresponds to that of the population in the county. It samples 500 teachers at random and wants to compare that sample with the census numbers about the racial groups in that county. Which test would be appropriate?
  • z-test

  • chi-square test for goodness-of-fit

  • chi-square test of independence

  • chi-square test of homogeneity

  • none of these

This test is for evaluating how well the observed counts of a categorical variable for a given sample conform to (i.e., fit) the expected counts of the variable. In our case we want to see how the observed counts of categorical variable "race" in our sample conform to the corresponding counts found in the census, so this test is appropriate for testing the null hypothesis.


  1. An airline wants to find out whether there is a connection between the customer's status in its frequent flyer program and the class of ticket that the customer buys. It samples 1,000 ticket records at random and for each ticket notes the status level ('none', 'silver', 'gold') and the ticket class ('economy', 'business','first'). Which test would be appropriate?
  • z-test

  • chi-square test for goodness-of-fit

  • chi-square test of independence

  • chi-square test of homogeneity

  • none of these

This test is for evaluating the dependence between two categorical variables measured on the same sample. In our, we want to understand if there is a connection between the two categorical variables "status level" and "ticket class" measured on one sample, so this test is appropriate for testing the null hypothesis.


  1. The airline wants to find out whether there is a connection between the customer's status in its frequent flyer program and the amount that the customer spends on tickets in the following year. It samples 1,000 ticket records at random and for each ticket notes the status level ('none', 'silver', 'gold') and the amount spent on tickets in the following year. Which test would be appropriate?
  • z-test

  • chi-square test for goodness-of-fit

  • chi-square test of independence

  • chi-square test of homogeneity

  • none of these