Your friend (who has only studied a brief introduction to econometrics) is interested in
identifying the factors which cause people to choose to smoke, and has estimated the
following model:
smokeri = β0 + β1agei + β2educi + β3incomei + β4pcigsi + ui
.
Here smokeri
is a binary variable, taking a value of 1 for someone who smokes and zero
otherwise. The variable agei
is the age of person i, educi
is his/her years of education completed, incomei
is his/her annual income and pcigsi
is the price of a packet of
cigarettes in his/her location.
Your friend has allowed you to see the results of their regression which they intend to
report.
Call:
lm(formula = smoker ~ age + educ + income + pcigs79, data = gujsmoke)
Residuals:
Min 1Q Median 3Q Max
-0.6417 -0.3880 -0.2782 0.5563 0.8405
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.123 1.884e-01 5.963 3.27e-09 ***
age -0.0047 8.290e-04 -5.701 1.50e-08 ***
educ -0.0206 4.616e-03 -4.465 8.76e-06 ***
income 0.00000102 1.632e-06 0.629 0.5298
pcigs79 -0.00513 2.852e-03 -1.799 0.0723 .
—
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.477 on 1191 degrees of freedom
Multiple R-squared: 0.03877,Adjusted R-squared: 0.03554
F-statistic: 12.01 on 4 and 1191 DF, p-value: 1.431e-09
(a) Explain some of the problems affecting your friend’s model and results.
[8 MARKS]