##### 🍐 我们总结了计量经济学代写中——STATA代写的经典案例，如果你有任何Econometrics代写的需要，可以随时联络我们。CoursePear™ From @2009。

“Calculator” was once a job description. This problem set gives you an opportunity to do some calculations on the relation between smoking and lung cancer, using a (very) small sample of five countries. The purpose of this exercise is to illustrate the mechanics of ordinary least squares (OLS) regression. You will calculate the regression “by hand” using formulas from class and the textbook. For these calculations, you may relive history and use long multiplication, long division, and tables of square roots and logarithms; or you may use an electronic calculator or a spreadsheet.

The data are summarized in the following table. The variables are per capita cigarette consumption in 1930 (the independent variable, “X”) and the death rate from lung cancer in 1950 (the dependent variable, “Y”). The cancer rates are shown for a later time period because it takes time for lung cancer to develop and be diagnosed.

Observation #

1 2 3 4 5

Country

Switzerland Finland Great Britain Canada Denmark

Cigarettes consumed per capita in 1930 (X) 530
1115
1145
510
380

Lung cancer deaths per million people in 1950 (Y) 250
350
465
150
165

Source: Edward R. Tufte, Data Analysis for Politics and Management, Table 3.3.

1. Use a calculator, a spreadsheet, or “by hand” methods to compute the following; refer to the textbook for the necessary formulas. (Note: if you use a spreadsheet, attach a printout)

a) b) c)

d) e) f) g)

ThesamplemeansofXandY, X and Y .
The standard deviations of X and Y, sX and sY. The correlation coefficient, r, between X and Y

ˆ1 , the OLS estimated slope coefficient from the regression Yi = 0 + 1Xi + ui ˆ0 , the OLS estimated intercept term from the same regression

ˆ
Yi , i = 1,…, n, the predicted values for each country from the regression

uˆi , the OLS residual for each country.

2. On graph paper or using a spreadsheet, graph the scatterplot of the five data points and the regression line. Be sure to label the axes, the data points, the residuals, and the slope and intercept of the regression line.

1

3. This time, please calculate the same statistics using STATA. On the STATA output file, find and label the items.

a) b) c)

d) e) f) g)

ThesamplemeansofXandY, X and Y .
The standard deviations of X and Y, sX and sY. The correlation coefficient, r, between X and Y

ˆ1 , the OLS estimated slope coefficient from the regression Yi = 0 + 1Xi + ui ˆ0 , the OLS estimated intercept term from the same regression

ˆ
Yi , i = 1,…, n, the predicted values for each country from the regression

uˆi , the OLS residual for each country.

STATA HINTS: First load STATA and type “edit,” which brings up something that looks like a spreadsheet. Enter the smoking and cancer values in the first two columns. Double- click the column headers to enter variable names (e.g. “smoke”, “death”). Close the editor window when you are done. The following commands will be useful:

list summarize

correlate

lists the data (to be sure you typed it in correctly)
computes sample means and standard deviations (the option “,detail” gives additional statistics, including the sample variance)
produces correlation coefficients (with the option “, covariance” this command produces covariances)
estimates regression by OLS
compute OLS predicted values and residuals

regress
predict
Note that STATA has on-line help.

Do not be concerned if you do not yet understand all the statistics shown in the output – we will discuss them in class in due course.

1. [graded] Using “graph twoway” command in STATA, graph the scatterplot of the five data points and the regression line. Interpret sample slope and sample intercept.
2. [graded] Using the data file birthweight_smoking, which contains data for a random sample of babies born in Pennsylvania in 1989, answer the following questions. The data include the baby’s birth weight together with various characteristics of the mother, including whether she smoked during the pregnancy. Let 𝑌 denote the baby’s birth weight (in grams) for mother 𝑖

𝑖
and 𝑋𝑖 an indicator variable that equals one if the mother smoked during pregnancy and zero,

otherwise. Consider the linear regression model

𝑌 = 𝛽 + 𝛽 𝑋 + 𝑢 , 𝑖 = 1, … , 𝑛. 𝑖01𝑖𝑖

(a) Run a regression of 𝑌 on 𝑋 . Report your estimation result in the following form. 𝑖𝑖

2

̂
𝐵𝑖𝑟𝑡h𝑊𝑒𝑖𝑔h𝑡 = ? ? ? + ? ? ? 𝑆𝑚𝑜𝑘𝑒𝑟,

(???) (???)

where the numbers in the parentheses are standard errors.

1. (b)  In view of your estimation result in part (a), what is the predicted value of the birth weight for mothers who do not smoke? What is predicted value of the birth weight for those who smoke?
2. (c)  Compute the sample correlation coefficient between the birth weight and education. Interpret your estimation result.
3. (d)  In view of part (c), what does the term 𝑢𝑖 represent here? Why do different mothers have different values of 𝑢𝑖?
4. (e)  In view of part (c), do you think that 𝐸[𝑢 |𝑆𝑚𝑜𝑘𝑒𝑟 ] = 0? 𝑖𝑖
5. (f)  The regression error term 𝑢𝑖 is homoskedastic if the conditional variance of 𝑢𝑖 given 𝑋𝑖 = 𝑥 does not depend on 𝑥. When you computed your standard error in part (a), did you report homoskedasticity-only standard errors or heteroskedasticity-robust standard errors? Justify your choice briefly.
6. (g)  Using your preferred standard errors, report the 95% confidence interval for 𝛽 . Using your 1 confidence interval, carry out the hypothesis test for the null hypothesis 𝐻0 that smoking is associated with a decrease of 300 grams in the birth weight.
7. (h)  What is the value of 𝑅2? A friend of yours claim that a very low 𝑅2 means that an estimated coefficient of 𝛽 is insignificant. Would you agree? Explain briefly.

Following questions will not be graded, they are for you to practice and will be discussed at the recitation by your teaching assistant:

1. [Practice question, not graded] SW Exercise 4.1
2. [Practice question, not graded] Let 𝐾𝐼𝐷𝑆 denote the number of children born to a woman, and let 𝐸𝐷𝑈𝐶 denote years of education for the woman. A simple model relating fertility to years of education is 𝐾𝐼𝐷𝑆 = 𝑎+𝑏∗𝐸𝐷𝑈𝐶+𝑢, where u is the unobserved residual. (a) What kinds of factors are contained in u? Are these likely to be correlated with level of education?

1

3

(b) Will simple regression of kids on EDUC uncover the ceteris paribus (‘all else equal’) effect of education on fertility? Explain.

1. [Practice question, not graded] SW Exercises 5.1
2. [Practice question, not graded] Any of the Empirical Exercises at the end of Chapter 4 and Chapter 5. (Teaching assistant will go over one of the empirical exercises in the recitation as a practice to use Stata)

4