## References
Put anything you referred to here including books, online articles (provide links), online videos (provide links), etc. Number everything and use the number to refer to it in your test.
1.
2.
3.
## Collaborators
List all collaborators here. Number everyone and use the number to refer to them in the test if you quote them.
1.
2.
3.
# Loading Data and Libraries
Don’t touch any of this. I am loading the libraries and data you will need.
“`{r, warning=F, message=F}
library(knitr)
opts_chunk$set(warning=FALSE, message=FALSE)
library(dplyr)
library(ggplot2)
library(janitor)
library(tidyr)
load(“ACS_clean.RData”)
“`
# Statistics Questions
You don’t need to write any code. Your job is to interpret the code and results.
1. **Lately I’ve wondered if households with children pay more than households without children for an apartment of the same size. Below are the results for the hypothesis test.**
a. __Write down null and alternative hypotheses for my statement above.__ (5 points)
ANSWER HERE.
“`{r, echo=F}
mydata_clean %>%
filter(new_HUPAC %in% “With children 6 to 17 years”,
!is.na(RNTP),
NP == 5) %>%
distinct(SERIALNO, RNTP, NP) %>%
pull(RNTP) ->
with_children_rent
mydata_clean %>%
filter(new_HUPAC %in% “No children”,
!is.na(RNTP),
NP == 5) %>%
distinct(SERIALNO, RNTP, NP) %>%
pull(RNTP) ->
no_children_rent
“`
“`{r}
t.test(with_children_rent, no_children_rent)
“`
b. __What is the average rent for households with children? What is the average rent for households without children?__ (5 points)
ANSWER HERE
c. __Write down the interpretation for the confidence interval.__ (5 points)
ANSWER HERE
2. __Some households are comprised entirely of adults that live together. `NP` is the number of people in the household. `median_PINCP` is the typical personal income (per person) in a household of that size. `mean_earners` is the typical number of people in the household who earn money. `mean_AGEP` is the typical age of people in the household. `count` is the number of households of that size.__
__For example in household of size three, each person typically earns 46k; there are typically slightly fewer than three people working but most households have three workers; the typical age is 48. There were 1,108 such households in the sample.__
“`{r, echo=F}
mydata_clean %>%
filter(new_HUPAC %in% “No children”) %>%
distinct(SERIALNO, PINCP, AGEP, NP) %>%
group_by(SERIALNO, NP) %>%
summarize(median_PINCP = median(PINCP, na.rm=T),
earners = sum(!is.na(PINCP)),
mean_AGEP = mean(AGEP, na.rm=T),
n = n()) %>%
ungroup() %>%
group_by(NP) %>%
summarize(median_PINCP = round(mean(median_PINCP, na.rm=T), 0),
mean_earners = round(mean(earners), 2),
mean_AGEP = round(mean(mean_AGEP, na.rm=T), 0),
count = n()) %>%
kable()
“`
a. __As the size of the household increases, what happens to typical earnings, age and number of earners? Why do you think this is the case?__ (5 points)
ANSWER HERE.
b. __What does it mean that the typical number of earners is less than the number of people in the household?__ (2 points)
ANSWER HERE.
3. __Approximately 20% of people in the ACS dataset are children (under the age of 18). If children were randomly distributed across the population then about 20% of every household would be children. This isn’t the case as can be seen in the histogram below which is the histogram showing the proportion of children in household. Many households have zero children.__
“`{r, echo=F}
mydata_clean %>%
group_by(SERIALNO) %>%
summarize(kids = sum(AGEP < 18),
youngest = min(AGEP, na.rm=T),
oldest = max(AGEP),
age_range = oldest - youngest,
NP = max(NP),
pct_kids = kids/(NP) ) %>%
ungroup() ->
a
a %>%
filter(pct_kids != 1) %>%
ggplot(aes(x=pct_kids)) +
geom_histogram(boundary= 0,
closed = “left”,
binwidth = 0.06125) +
xlab(“Proportion of children in household”)
“`
__If you include all households in the data, the typical proportion of children in a household is between 12.4% and 13.1%.__
“`{r, echo=F}
a %>%
pull(pct_kids) %>%
t.test()
“`
__If you only look at households with children, the typical proportion of children in a household is between 42.2% and 43.1%.__
“`{r, echo=F}
a %>%
filter(pct_kids > 0) %>%
pull(pct_kids) %>%
t.test()
“`
__What are the benefits and drawbacks of using the confidence interval (CI) computed from all households versus the CI computed from households that have children? Which do you think is the appropriate CI to use to describe the typical proportion of children in a household?__. (10 points)
ANSWER HERE.
4. __The table below shows a the results of regression in which the response variable was rent (RNTP) and the explanatory variables are below. Please explain the context and meaning of the coefficients and significance which are in the regression output below.__ (5 points)
1. __number of people (NP),__
2. __number of low wage workers in the household (made less than 20k per year) (low_wage),__
3. __the number of high wage workers in the household (made more than 100k per year) (high_wage),__
4. __the number of part time workers in the household (worked less than 30 hours per week) (part_time)__.
“`{r, echo=F}
mydata_clean %>%
group_by(SERIALNO) %>%
summarize(part_time = sum(WKHP < 30),
NP = max(NP),
RNTP = max(RNTP),
low_wage = sum(WAGP < 20000),
high_wage = sum(WAGP > 100000)
) %>%
ungroup() %>%
drop_na() ->
rent
lm.rent <- lm(RNTP ~ NP + low_wage + high_wage + part_time, data=rent) ``` ```{r} summary(lm.rent) ``` ANSWER HERE. # Don't Touch This ```{r, eval=F} edits <- format(Sys.time(), "%a %b %d %X %Y", tz = "UTC+07:00") saveRDS(edits, file="edits.rds") ``` ```{r, echo=F} edits <- readRDS("edits.rds") edits <- append(edits, format(Sys.time(), "%a %b %d %X %Y", tz = "UTC+07:00")) saveRDS(edits, file="edits.rds") edits ```