EDUC 200C Section 2 - Describing Data

EDUC 200C Section 2 - Describing Data

EDUC 200C Section 4 Review Melissa Kemmerle October 19, 2012 Goals Review regression and measures of fit Review Spearman correlation and relationship to Pearson correlation Talk briefly about normal distributions Quick review of everything

midterm next Wednesday Questions Regression Use regression to predict how one variable changes in response to another variable. Prediction line is calculated by minimizing the total squared difference between the line representing our prediction and the actual data Regression line notation and formulas

Y = bYXX + aYX Regression line slope: bYX = rYX (y / x) Regression line intercept: aYX = Y - How do we know if weve explained the data well? Standard error is the same as standard deviation except that we look at deviation from the prediction rather than deviation

from the mean ( )2 = Extreme examples Same Y data with differing relationships to X Standard Deviation is relative to the mean Since the Y data is identical in both

graphs, the total variance of Y is also identical Standard Error is relative to the prediction The different relationships of Y to X is reflected by how close predicted values of Y are to actual values 5 6

7 8 9 Explained vs. Unexplained variance 2

4 6 Estimated hand width Measured hand width 8 Fitted values 10

How much variance have we explained? You can crudely think of the error variance as how much variance in Y is left over after accounting for X Knowing X gets us close, but probably not all the way, to knowing Y gives the percent of total variance in Y that we have not explained with X

thus gives percent of variance in Y explained 2 by X Conveniently, this is equal to Can also think of this as the percent of shared variance

between X an Y Stata Error variance, sY2 Total variance, sY2 . Reg Y X R-squared, rYX2 Standard error, sY Source |

SS df MS -------------+-----------------------------Model | 3.92107903 1 3.92107903 Residual | .043640216 48 .000909171 -------------+-----------------------------Total | 3.96471925 49 .080912638 Number of obs F( 1,

48) Prob > F R-squared Adj R-squared Root MSE = 50 = 4312.81 = 0.0000 = 0.9890 = 0.9888

= .03015 -----------------------------------------------------------------------------Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------X | .0194055 .0002955 65.67

0.000 .0188114 .0199996 _cons | .0418653 .008658 4.84 0.000 .0244573 .0592733 ------------------------------------------------------------------------------

Note that these values have bias corrections that make them more like s than Spearman Correlation Identical to Pearson correlation except that we are specifically dealing with rank-order data rather than continuous data Gives a measure of the relationship of the relative ranks rather than relative values Where D is the difference in ranks, rather than difference if values, for the same observation

Spearman vs. Pearson When using rank-order data, using the Spearman formula and the Pearson formula will give you identical results The Spearman rank-order correlation coefficient is usually different from the Pearson r correlation coefficient if the Pearson r is calculated using untransformed data (i.e. not rank-order data) Consider the case where you keep increasing the highest value of one of the variables of interestthis will affect the Pearson correlation, but not the

Spearman correlation The Normal Curve The null hypothesis Example: A study compares the results of a new reading program for middle school students. In this study, 36 students received the experimental reading program Each students reading score was measured before and after the program. The variable of interest was score change

Score change was positive if a students score improved and negative if the score got worse What is our null hypothesis? Hypothesis testing vocabulary Null Hypothesis: A hypothesis to be tested. Use the symbol H0 (e.g. H0 : =0) Alternative Hypothesis: A hypothesis that represents the opposite of the null hypothesis One or the other must be true, there can be no third option

Use the symbol HA or H1 (e.g. HA : 0) Hypothesis Test: The test of whether the null hypothesis (H0) should be rejected in favor of the alternative hypothesis. Questions so far? Review of Everything Measures of central tendency Mean: Median: value greater than 50% of all other observations

Mode: most common value Review of Everything Measures of Spread Population variance, 2: (Unbiased) sample variance, s2: Population standard ( )2 error, :

(Unbiased) sample standard ( )2 1 s: error, Review of Everything Z scores Data transformation to give data a mean on 0 and a standard deviation of 1

= Review of Everything Correlation Pearson r correlation coefficient Z-score difference formula Z-score product formula

= )( ) Raw score formula ( =

Review of Everything Correlation Spearman rank-order correlation coefficient Review of Everything Regression Predict Y from X: Error (or residual):

Review of Everything Regression Standard error: R-squared: R2 gives us the percent of variance in Y explained by X. This is sometimes called percent of shared variance. Questions?

Recently Viewed Presentations

  • Global Climate Change Action and the Transportation Sector:

    Global Climate Change Action and the Transportation Sector:

    Clean Power Plan - stayed by SCOTUS Feb. 2016. ... Source: LSE Grantham Research Institute on . Climate Change and the Environment and Columbia University. International Climate Change Action in the Transport Sector. ... Dundon, Leah Anne ...
  • Creating Effective Assessments

    Creating Effective Assessments

    Creating Effective Outlines and Assessments Dr. Maureen Reed, Director, LTO New Faculty Orientation August 23, 2011 "On tests, faculty should be willing to allow access to dictionaries when ESL students encounter unknown words, especially when those words merely impede comprehension...
  • Prevention of Sepsis  Handwashing  Skin preparation  Tissue Handling

    Prevention of Sepsis Handwashing Skin preparation Tissue Handling

    Surgical Drains Indications: Prevent accumulation of fluid Blood Pus Infected fluids Prevent accumulation of air/gas Characterise fluid Surgical Drains Types: Open Simple (corrugated tubes or sheets) Can increase the risk of infection in non-infected cases Closed Reduce the risk of...
  • Digestion and Nutrition -

    Digestion and Nutrition -

    Breaking down molecules by adding water Also called chemical digestion Disac.+ Water Monosac. + Monosac. C12H22O11 + H2O C6H12O6 + C6H12O6 Fats Made up of: 1 glycerol molecule 3 long fatty acid chains Looks like a giant letter "E" Also...
  • Moonen -

    Moonen -

    Part-IV Part-III Part-II Literature / DSP-CIS Library Collection of books is available to support course material List/reservation via DSP-CIS webpage Contact: [email protected] Exercise Sessions: Acoustic Modem Project Digital communication over an acoustic channel (from loudspeaker to microphone) FFT/IFFT-based modulation ...
  • Module 5 Communication, homeostasis & energy Block 2C

    Module 5 Communication, homeostasis & energy Block 2C

    The SAN sets a regular heart rate, but this can be increased or slowed when necessary. Cardiac muscle = myogenic - initiates its own beat at regular intervals. Atrial has a higher myogenic rate than the ventricular muscle.
  • Why Electrical Engineering I?

    Why Electrical Engineering I?

    Circuit Diagram Light Sensor Is = K amps + - Room Lights Vs =120v -VL/5 100Ω + VL _ Plug-In Hybrid Electric Vehicles IEEE - USA Position Statement Adopted June 15, 2007 Use PHEVs to add resilence to our transportation...
  • AIM for Literacy

    AIM for Literacy

    The whole group should then discuss. * Slide 14: Participants will have his or her book for this slide. In the Carnegie Corporation of New York and the Alliance for Excellent Education, Biancarosa and Snow (2004) suggested 15 elements that...