Question Set 1
1. Get to know your scientific question (Chapter 1)
(a) Identify the variable of interest.
(b) Identify the population(s) and sample(s).
(c) Identify the parameter(s) and statistic(s).
(d) What is the scientific question? Is this Descriptive Statistics or Inferential Statistics?
2. Get to know your data (Chapter 1)
(a) Identify the types of your data: nominal data, ordinal data or quantitative data.
(b) Identify the types of your data: time series data or cross-sectional data.
(c) Identify the source of your data: primary data or secondary data. Do you think the data is
reliable? Are there possible issues with your data?
3. Calculate descriptive statistics in Excel (Chapter 3)
(a) Calculate the sample statistics for your variable of interest, such as sample mean (x¯), median,
mode, variance (s
2
), and standard deviation (s).
(b) Identify two different groups based on the qualitative data. Calculate the above statistics for
each group to compare.
4. Display your data with charts and graphs in Excel (Chapter 2)
(a) Construct displays that best describe your qualitative variable (e.g. bar chart, pie chart); and
describe the distribution.
(b) Construct displays that best describe your variable of interest and describe its distribution.
(Use: Frequency distribution tables, histograms and/or the empirical rule to discuss normality,
symmetry and skewness)
(c) Construct displays that best describe the relationship/association between two quantitative
variables (the variable of interest as the dependent variable, y, and another quantitative
variable as the independent variable, x); and describe the relationship.
5. Distributions (Chapters 5-6)
(a) Consider the distribution of your quantitative data in 4(b). Would it be appropriate to use the
Binomial or Normal distribution to model your data? Why or why not? Hint: The binomial
distribution models success/failure discrete data while the normal distribution is for bellshaped continuous data.
1
Question Set 2
1. Construct a confidence interval for a population mean (Chapter 8)
(a) Do you need to make assumptions in order to perform the procedure of constructing a
confidence interval? If so, what assumptions need to be made? If not, why?
(b) Construct a confidence interval for the average highway MPG.
i. Should you use a z-interval or a t-interval? Why?
ii. Compute the necessary statistics for constructing a confidence interval.
iii. Find the margin of error of the confidence interval at confidence levels of 90% and 94%,
respectively.
iv. Calculate these two confidence intervals.
(c) Someone believes that the average highway MPG is 30.35 . Does the sample support the
claim? Explain if you have different conclusions using the above two confidence intervals.
(You must discuss in terms of accuracy and precision.)
2. Conduct a hypothesis test for a population mean (Chapter 9)
(a) Do you need to make assumptions in order to perform the procedure of conducting a hypothesis
test? If so, what assumptions need to be made? If not, why?
(b) Using α = 0.01 perform a hypothesis test to determine if the average highway MPG is higher
than 30 .
i. Write down the hypotheses.
ii. Calculate the test statistic, critical value(s) and p-value.
iii. Describe your decision of the test and make a conclusion based on the context.
3. Compare two population means (Chapter 10)
(a) Do you need to make assumptions in order to perform the procedure of conducting a hypothesis
test or constructing a confidence interval? If so, what assumptions need to be made? If not,
why?
(b) Using α = 0.08 perform a hypothesis test to determine if the mean Highway MPG of the two
groups identified by your qualitative variable are different. We cannot assume equal variances.
List the results of all key steps before you reach your conclusion, such as the hypotheses, test
statistic, critical value(s) and/or p-value. (Use the Data Analysis Toolpak in Excel.)
(c) Find the 96% confidence interval to estimate the average difference in highway MPG between
the two populations according to the qualitative variable.
(d) Interpret the above confidence interval.
2
Question Set 3
1. Building a Simple Linear Regression Model: Preprocess.
(a) Identify all quantitative variables from the dataset.
(b) Construct a Scatter Plot to show the relationship between Highway MPG (Y ) and each
independent variable. Calculate the sample correlation coefficients for all pairs. Describe the
association.
(c) Which pair has the strongest linear association?
(d) Write down the general formula for the Simple Linear Regression Model between Y and X.
(Write the formula using general parameters notation β0 and β1, what should be capitalize or
lowercase ? what should be added, if any? )
2. Describe the linear relationship between Highway MPG (Y ) and the variable you answered in
2(c) (above) as x.
(a) Calculate the slope and y-intercept of the least squares regression line using Excel. Write
down the linear equation.
(b) Interpret the regression slope.
(c) What percentage of the total variation in y can be explained by this independent variable x?
3. Use the regression model to predict Highway MPG (Y ).
(a) What is the predicted highway MPG with 5.5 ? (Fill in the blank with units
and name of the independent variable you chose.)
(b) Calculate the 94% confidence interval for the average Highway MPG (Y ) with 5.5
and interpret. (Fill in the blank with units and name of the independent
variable you chose.)
(c) Calculate the 94% prediction interval for a SINGLE highway MPG (Y ) with 5.5
and interpret. (Fill in the blank with units and name of the independent variable you chose.)
4. Is there a linear relationship between Y and X?
(a) Test the significance of the slope of the regression equation. Use α = 0.06.
i. Write down the hypotheses.
ii. What is the p-value?
iii. Describe your decision.
(b) Develop a 90% confidence interval for the population slope. Does this confidence interval
include 0? (c) State your conclusion.(Hint: You may need to re-calculate Regression analysis:
Data → Data Analysis → Regression → Confidence level.)
5. Check the assumptions for regression analysis. Make necessary plots in Excel to justify and
include them in your answers.
(a) Is the relationship between the dependent and independent variables linear? Which plot
should you check?
3
(b) Do the residuals exhibit some pattern across values for the independent variable? Which plot
should you check?
(c) Is the variation of the dependent variable the same across all values of the independent
variable? Which plot should you check?
(d) Do the residuals follow the normal probability distribution? Which plot should you check?
(e) Conclusion: Are the results from the regression analysis reliable?
Question Set 4
1. Model 1: Develop a multiple regression model to predict the Highway MPG (Y ) using all the
other variables of interest as listed above. (Round all numerical answers to two decimal places
as needed.)
(a) Identify qualitative variable(s) from the list of variables of interest, if there is any, and create
a dummy variable in Excel. (Note: use Excel function =IF() and use alphabetical order
to assign values 0 and 1)
(b) Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the
regression equation for Model 1. (Enter in Excel the confidence level given in question 1(e).
Note: Excel requires that the independent variables be located in adjacent columns)
(c) Explain the variation of the dependent variable after accounting for the effects of the other
independent variables:
i. What percentage of total variation in the Highway MPG (Y ) can be explained by Model 1?
ii. What is the value of the adjusted multiple coefficient of determination, R2
A?
(d) Is the overall regression model significant using α = 0.03? State the hypotheses and your
conclusion.
(e) Which independent variables are signifcant predictors using α = 0.3 or confidence level 70%?
Which are not significant? (After accounting for the effects of the other independent variables)
2. Develop a second multiple regression model (Model 2) using ONE step of the “backward
elimination method”. (Remember: variables should be removed one at the time and regression
analysis i.e. coefficients, R2
, p-values, etc must be re-calculated at each step) (Round all
numerical answers to two decimal places as needed.)
(a) Which variable should you remove from Model 1? Why?
(b) Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the
regression equation for Model 2. (Enter in Excel the confidence level given in question 2(e).
Note: Excel requires that the independent variables be located in adjacent columns)
(c) Explaining the variation of the dependent variable:
i. What percentage of total variation in the Highway MPG (Y ) can be explained by Model 2?
How does this compare with the percentage you obtained with Model 1?
ii. What is the value of the adjusted multiple coefficient of determination, R2
A? How does this
compare with the one you obtained with Model 1?
4
(d) Is the overall regression model (Model 2) significant using α = 0.09?
(e) Are all the independent variables in Model 2 significant predictors using α = 0.1 or confidence
level 90 % after accounting for the effects of the other independent variables?
(f) Prediction:
i. Is Model 2 better than Model 1?
ii. Predict the highway MPG(Y ) with Vehicle Type = Truck; Weight (pounds) = 3280; Displacement (liters) = 6.4; Horsepower() = 184; Number of Cylinders() = 7 using “the best” model
(between Model 1 and Model 2). NOTE: you may or may not need to use all given values.
(g) Interpret regression coefficients.
i. Interpret the coefficient of Displacement.
3. Check the assumptions for regression analysis for the model you have chosen. Make necessary
plots in Excel to justify.
(a) Is the relationship between the dependent and independent variables linear?
(b) Do the residuals exhibit some patterns across values of the independent variables?
(c) Are the variations of the dependent variable the same across all values of the independent
variables?
(d) Do the residuals follow the normal probability distribution?
(e) Conclusion: Are the results from the regression analysis reliable?