Data Analysis - Rowan

Data Analysis - Rowan

Statistics: Data Presentation & Analysis Fr Clinic I Overview Tables & Graphs Populations & Samples Mean, Median, & Variance Error Bars Standard Deviation, Standard Error & 95% Confidence Interval (CI) Comparing Means of Two Populations Linear Regression (LR) Warning Statistics is a huge field, Ive simplified considerably here. For example:

Mean, Median, and Standard Deviation There are alternative formulas 95% Confidence Interval There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean) Error Bars Dont go beyond the interpretations I give here! Comparing Means of Two Data Sets We just cover the t test for two means when the variances are unknown but equal, there are other tests Linear Regression We only look at simple LR and only calculate the intercept, slope and R 2. There is much more to LR! Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters Water

Pond Water (2) 10 (3) 13 Apparent Color (Pt-Co) (4) 30 Sweetwater 4 5 12 Hiker

3 8 11 (1) Turbidity True Color (NTU) (Pt-Co) Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns Consistent Format, Title, Units Good Axis Titles, Big Fonts Figures 25 Turbidity (NTU)

20 20 11 15 10 11 10 7 5 5 1 0 Pond Water Sweetwater Miniworks

Hiker Pioneer Voyager Filter Figure 1: Turbidity of Pond Water, Treated and Untreated Populations and Samples Population All possible outcomes of experiment or observation US population Particular type of steel beam Sample Finite number of outcomes measured or observations made 1000 US citizens 5 beams

Use samples to estimate population properties Mean, Variance E.g., Height of 1000 US citizens used to estimate mean of US population Central Tendency Mean and Median 1 3 3 6 8 10 Mean = xbar = Sum of values divided by sample size = (1+3+3+6+8+10)/6 = 5.2 NTU Median = m = Middle number Rank 1 2 3 4 5 6 Number 1 3 3 6 8 10 For even number of sample points, average middle two = (3+6)/2 = 4.5

Excel: Mean AVERAGE; Median - MEDIAN Variability Variance, s2 sum of the square of the deviation about the mean divided by degrees of freedom s2 = n(xi xbar)2/(n-1) Where xi = a data point and n = number of data points Example (cont.) s2 = [(1-5.2)2 + (3-5.2)2 + (3-5.2)2 + 6-5.2)2 + (8-5.2)2 + (10-5.2)2] /(6-1) = 11.8 NTU2 Excel: Variance VAR Error Bars Show data variability on plot of mean values Types of error bars include: Max/min, Standard Deviation, Standard Error, 95% CI Turbidity (NTU) 10

8 6 4 2 0 Filter 1 Filger 2 Filter Type Filter 3 Standard Deviation, s 2 Square-root of variance s s If phenomena follows Normal Distribution (bell curve), 95% of population lies within 1.96 standard deviations of the mean Normal Distribution Error bar is s

above & below mean 95% Excel: standard deviation STDEV -4 -1.96 -2 0 1.96 2 Standard Deviation Standard Deviations from Mean 4 Standard Error of Mean

sX Also called St-Err or sxbar For sample of size n taken from population with standard deviation estimated as s s sX n As n , sxbar estimate, i.e., estimate of population mean improves Error bar is St-Err above & below mean 95% Confidence Interval (CI) for Mean A 95% Confidence Interval is expected to contain the population mean 95 % of the time (i.e., of 95%-CIs from 100 samples, 95 will contain pop mean) X t 95%,n 1s X t95%,n-1 is a statistic for 95% CI from sample of size n t95%,n-1 = TINV(0.05,n-1)

If n 30, t95%,n-1 1.96 (Normal Distribution) Error bar is t95%,n 1s X above & below mean Using Error Bars to compare data Standard Deviation Demonstrates data variability, but no comparison possible Standard Error If bars overlap, any difference in means is not statistically significant If bars do not overlap, indicates nothing! 95% Confidence Interval If bars overlap, indicates nothing! If bars do not overlap, difference is statistically significant Well use 95 % CI in this class Any time you have 3 or more data points, determine mean, standard deviation, standard error, and t95%,n-1, then plot mean with error bars showing the 95% confidence interval Adding Error Bars to an Excel Graph Create Graph

Column, scatter, Select Data Series In Layout Tab-Analysis Group, select Error Bars Select More Error Bar Options Select Custom and Specify Values and select cells containing tthe values 95%, n 1 s X Example 1: 95% CI Turbidity Data 1 2 3 mean St Dev NTU NTU NTU NTU NTU 2.1 2.1 2.2 2.1

0.06 3.2 4.4 5 4.2 0.92 4.3 4.2 4.5 4.3 0.15 Filter 1 Filter 2 Filter 3 7.0 6.0 Turbidity (NTU) 5.0 4.2 4.3

Filter 2 Filter 3 4.0 3.0 2.1 2.0 1.0 0.0 Filter 1 Portable Water Filter n 3 3 3 St-Err NTU 0.03 0.53

0.09 t 95%,2 +/- 95% CI t 95%,2St-Err 4.30 4.30 4.30 0.14 2.28 0.38 What can we do? Lift weight multiple times using different solar panel combinations (or hyrdoturbines, or gear boxes) and plot mean and 95 % Confidence interval error bars. If error bars overlap between to different test conditions, indicates nothing! If error bars do not overlap, difference is statistically

significant T Test A more sophisticated way to compare means Use t test to determine if means of two populations are different E.g., lift times with different solar panel combinations or turbines or Comparing Two Data Sets using the t test Example - You lift weight with two panels in series and two in parallel. Series: Mean = 2 min, s = 0.5 min, n = 20 Parallel: Mean = 3 min, s = 0.6 min, n = 20 You ask the question - Do the different panel combinations result in different lift times? Different in a statistically significant way Are the Lift Times Different? Series Use TTEST (Excel)

Fractional probability of being wrong if you claim the two populations are different Well say they are significantly different if probability is 0.05 Parallel 1.5 2 2.2 1.8 3 1.6 1.2 2.1 1.9 2.2 2.6 1.7 1.8 1.5 2.4

2.5 2.7 1.4 1.5 2.6 3 2.4 2.2 2.6 3.4 3.6 3.8 3.5 2.7 2.4 3.5 3.8 2.1 2.5 3.4 3.3 2.4

3.6 2.3 3.7 Marbles Linear Regression Fit the best straight line to a data set Grade Point Average 25 20 y = 1.897x + 0.8667 R2 = 0.9762 15 10 5

0 0 2 4 6 8 10 12 Height (m) Right-click on data point and select trendline. Select options to show equation and R2. R2 - Coefficient of multiple Determination R2 = n(i - ybar)2 / n(yi - ybar)2 i = Predicted y values, from regression equation

yi = Observed y values Ybar = mean of y R2 = fraction of variance explained by regression R2 = 1 if data lies along a straight line

Recently Viewed Presentations

  • Selecting new testers and integrating them into test team

    Selecting new testers and integrating them into test team

    - recruitment advertising (job portals, social media, Latvian State Employment Agency) No help from personnel department. Previous experience - internal recruitment within the Language school. It is quicker and has a lower cost to hire someone internally. Teaching section -...
  • The Paper, Project, Product Dilemna

    The Paper, Project, Product Dilemna

    Theory of Knowledge (ToK) Emotion. Reason. Perception. Theory of knowledge (TOK) is a course in the International Baccalaureate (IB) Diploma Programme (DP). It is intended as an opportunity for students to reflect on the nature of knowledge, and on how...
  • Chapter 1 Introduction to Global Marketing

    Chapter 1 Introduction to Global Marketing

    Introduction to Global Marketing Global Marketing Chapter 1 * * Ex. Citicorp used this approach until the mid-1990's when John Reed instilled a geocentric approach. He sought to instill a higher degree of integration among operating units. James Bailey, Citicorp...
  • Welcome to American Lit! - The E-3 Healy Zone

    Welcome to American Lit! - The E-3 Healy Zone

    Do Now: Helping Partners Make sure you sit next to someone you trust: Do they take good notes? Will they keep you on task? Will they be willing to fill you in if you are absent?
  • D.L.P. - Week Three - Belle Vernon Area School District ...

    D.L.P. - Week Three - Belle Vernon Area School District ...

    Pronouns are used differently depending on what case they are. Subject pronouns, also known as nominative pronouns can work as subjects or predicate nouns. They are I, we, you, he, she, it, and they. Objective pronouns can work as direct...
  • The Ankle and Lower Leg Injuries

    The Ankle and Lower Leg Injuries

    The Ankle and Lower Leg Injuries Prevention: Heel cord stretching Before and after activity Strength training Achieving static & dynamic joint stability Neuromuscular control Enhanced by locomotion on uneven surfaces or balance board Footwear Shoes for which they are intended...
  • Proteins AP Biology Proteins Multipurpose molecules AP Biology

    Proteins AP Biology Proteins Multipurpose molecules AP Biology

    Pepsin is the first in a series of enzymes in our digestive system that digest proteins. In the stomach, protein chains bind in the deep active site groove of pepsin, seen in the upper illustration (from PDB entry 5pep), and...
  • An Open Framework for Assembly Code Verification

    An Open Framework for Assembly Code Verification

    An Open Framework for Certified System Software Xinyu Feng Yale University