AP STAT Section 3.3: Correlation and Regression Wisdom EQ: What are influential points and lurking variables and how do they impact the association between two variables? Recall:

Outliers--- points that are well removed from the trend that the other points seem to follow. Outliers in Univariate Data Set value in a set of data that does not fit with the rest of the data more than 3 standard deviations from the mean

lies outside the 1.5(IQR) fences Outliers in Bivariate Data extreme with respect to other y values in regression, a point that has an unusually large residual Influential Point in Bivariate Data

when removed the regression line changes leverage on the regression coefficient (aka known as slope) normally outliers in x direction, but are not always outliers in terms of regression (i.e. residual not large)

350 Heart attacks 300 Outlier 250

200 150 Influential observation 100 50 0

0 2 4 6 8

10 Wine consumption 12 14

16 The original data set is graphed at the right. Classify the new point as an outlier and/or an influential point. State whether its presence increases or decreases the strength of the association of the variables.

Outlier Decreases strength of association Outlier and Influential Influential Increases strength of Decreases strength of association

association Go over graphs on pp 235 236 Important Notes: Outlier points are almost always influential, but not vice-versa. Outliers in y-direction may influence y-intercept, but not the slope of the

regression line. Test for influential point is what happens to regression line when point is removed No RULE for determining outliers and influential points. Be able to explain what happens to correlation, slope, yintercept, and coefficient of determination when these points are

added to or removed from a scatterplot. Assignment: p. 238 239 #59 62 During the months of March and April of a certain year, the weekly weight increases of a puppy in New York were collected. For the same time f rame, the retail price increases of snowshoes in Alaska were collected.

Create a scatterplot for this data. Analyze both your graph and the summary statistics in a few sentences. Conclusion? The weight increase of a puppy in New York is CAUSING the price of snowshoes in Alaska to increase

or vice-versa. OF COURSE NOT!! Be sure your relationship makes sense. Scatterplots and correlation do not demonstrate causation. Association does not imply causation! Causation follows from linear

regression only. Lurking Variables --- has an important effect and yet is not included among the predictor variables under consideration. Perhaps its existence is unknown or its effect unsuspected. What

could be a lurking variable in these examples? a. There is a strong positive correlation between the foot length of K-12 students and reading scores. b. Students who need tutors have lower test scores than students who dont.

c. A survey shows a strong positive correlation between the percentage of a country's inhabitants that use cell phones and the life expectancy in that country. A group of college students believes that herbal tea has remarkable powers. To test this belief, they make weekly visits to a local nursing home, where they visit with the residents and serve them herbal tea. The nursing home staff reports that after several months

many of the residents are more cheerful and healthy. A skeptical sociologist commends the students for their good deeds but scoffs at the idea that herbal tea helped the residents. Identify the explanatory and response variables in this informal study. Then explain why lurking variables account for the observed association. KEY IDEAS TO FOCUS

ON: Univariate Data Bivariate Data Assignment: p. 242 243 #63, 64, 66, 67 p. 244 247 #69, 70, 73

