Geography 3000
Lab Assignment #3
Simple Regression Analysis Using Excel
Submit through Canvas: one spreadsheet file and one document file
Introduction: You are asked to investigate the relationship between two variables for 50 Metropolitan Statistical
Areas. This is a random sample of 380 total MSAs in the US. The data come from the US Census 2019 estimates.
Three variables are included in the dataset. You are only asked to compare two of them. The first variable,
PERCAP, represents the amount of income each person in the MSA would have if all income were distributed
equally. The second variable, PCTPOV, represents the percent of the population living in poverty in each MSA. The
third variable, EDAVG, represents the average educational attainment1 of people over 25 years of age. You will plot
the relationship between two of the three variables, calculate the degree and significance of the correlation between
them, perform a simple linear regression, and examine the residuals. You must find a significant relationship in your
model. Using these results, you will write a 1-2 page analysis of the relationship between the variables. It is
Write-up Guidelines (14pts):
a. Introduction: Present a brief problem statement. Which of the two variables would you consider to be the
dependent variable? Why? (2pts)
b. XY Scatterplot: Looking at the shape and pattern of the plot, how would you describe the apparent relationship
between X and Y? Does the plot suggest linearity? Is there a positive or negative relationship between X and Y? (2pts)
c. Strength of Relationship: Report the value of the correlation coefficient and results of hypothesis test. Is r
significant? What can you conclude from the magnitude of r and the significance test? How strong does the
relationship appear to be? If r is not significantly different from zero, what does that say about the relationship
between X and Y? (2pts)
d. Regression equation: State the regression equation. What does the intercept tell you? Does it have a meaningful
interpretation? What does the slope coefficient indicate? (2pts)
e. Residual Plot: Normally distributed residuals should be fairly randomly distributed around zero. There should be
no linear trends or fan shapes. Briefly discuss the pattern of the residuals for this data. Does the residual plot indicate
any overt violations of the assumptions? Identify any obvious outliers by name. (2pts)
f. Coefficient of Determination: Interpret r2. What can you conclude from it? (2pts)
g. Conclusions: Briefly summarize the results of the analysis and discuss your conclusions about the nature of the
relationship between X and Y. Is there a relationship? On what do you base your answer? (2pts)

1. Descriptive Statistics. Calculate the mean, variance, and standard deviation for both sets of data. Please be certain,
however, that you choose the proper variance and standard deviation. Ask yourself whether you are dealing with a
population or a sample. (1pt)
X variable Y variable
Mean = Mean =
Variance = Variance =
Std. Dev. = Std. Dev. =
N = _
2. Create an XY scatterplot of the two variables in Excel by selecting the data, going to Insert > Scatterplot and
choosing the first option. Be sure to put the dependent variable on the Y-axis. Include a chart title and label the axes.
(1pt)
3. Calculate the correlation coefficient ( r ) for this relationship. Use the formula below, noting there are numerous
ways to calculate r. Don’t forget to pay attention to signs (negative and positive). (1pt)
   
y
x
n
i
i
i
S
S
n
y
y
x
x
r
*
*
) 1
(
*
1

Note: if the value of r is larger than 1 or smaller than –1, there are errors in your calculations. )
4. Test the hypothesis that the correlation coefficient (r) is significantly different than 0. Your test statistic (r statistic)
will be the correlation coefficient divided by its standard error. It is distributed as a t-distribution with n-2 degrees of
freedom. (3pts)
H0: __________
HA: __________
SEr=
_________ r stat_____________________
Level of Significance: __________ Critical T-value(s)_____________
Decision Rule (i.e. rejection region): ____________________________________________
Conclusion: _______________________________________________________
5. Given a linear relationship between X and Y, one can summarize the basic relationship by fitting a straight line
through the data points on an XY coordinate plane. This line can then be used to estimate (predict) a value of Y for
any given value of X. The basic equation for a line is:
where is the predicted value of the dependent variable, X is the known value of the independent variable, B0 is the
intercept of the line on the Y-axis, and B1 is the slope of the line. Calculate the Least-Squares Line for X and Y. First
calculate B1 using either of the following two formulas. The second formula is generally considered the more
computationally efficient, but if you have already calculated r, Sy, and Sx, you may prefer to use the first formula.
(1pt)
6. Calculate the residuals for your data and plot them against the original X values to generate a residual plot. To
calculate residuals, first calculate for each value of X. Next, subtract from actual Y. (1pt)
7. To generate a residual plot, create an XY scatterplot. Specify the column containing your residuals as the Y series.
Specify the column containing X values as the X series. (1pt)
Note: the default for XY graphs has the axes crossing at zero. For residual plots, this will cause the axes to cross in the
middle of the plot area when the graph is generated.
8. Calculate the coefficient of determination, r2. This coefficient indicates how much of the variability in Y is
explained by X. (1pt)
Coefficient of Determination (r2)= ___
9. Check your calculations by using the built-in regression routine in Excel. The routine can be found by selecting
Tools-Data Analysis-Regression from the menu bar. In the output, you can check the r, r2
, intercept and slope
values. In addition, locate the t stat of the slope and compare it with the slope divided by the standard error of the
slope. They should be equal. Include the results as an additional sheet in your Excel file (1pt)
1 The categories of educational attainment are coded as follows:less than 9th grade=1, some high school=2,
high school diploma=3, some college=4, associate degree=5, bachelor’s degree=6, graduate degree=7.
Average educational attainment was calculated by multiplying the number of people in a given category by
the number of points assigned to that category, summing the total number of points, and then dividing by the
total number of people.

CoursePear™提供各类学术服务，Essay代写Assignment代写Exam / Quiz助攻Dissertation / Thesis代写Problem Set代做等。