Poisson Regression

Poisson Regression

Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 19751979 L. Winner (2006). NASCAR Winston Cup Race Results for 1975-2003, Journal of Statistics Education, Vol.14,#3, www.amstat.org/publications/jse/v14n3/datasets.winner.html Data Description Units: NASCAR Winston Cup Races (19751979) n=151 Races Dependent Variable: Y=# of Caution Flags/Crashes (CAUTIONS) Independent Variables: X1=# of Drivers in race (DRIVERS) X2=Circumference of Track (TRKLENGTH) X3=# of Laps in Race (LAPS) Generalized Linear Model Random Component: Poisson Distribution for # of Caution Flags Density Function: e X 1 , X 2 , X 3 X 1 , X 2 , X 3 PY y X 1 , X 2 , X 3 y! y y 0,1,2,... Link Function: g(= log( Systematic Component: g ( ) log( ) 0 1 X 1 2 X 2 3 X 3 X 1 , X 2 , X 3 e 0 1 X 1 2 X 2 3 X 3 Testing For Overall Model H0: (# Cautions independent of all predictors) HA: Not all j = 0 (# Cautions associated with at least 1

predictor) Test Statistic: Xobs2 = -2(lnL0-lnL1) Rejection Region: Xobs2 2,3 P-Value: P(23 Xobs2) Where: lnL0 is maximized log likelihood under model H0 lnL1 is maximized log likelihood under model HA NASCAR Caution Flag Example Model : g ( ) 0 Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood DF 150 150 150 150 Value 215.4915 215.4915 201.6050 201.6050 410.8784 Value/ DF 1.4366 1.4366 1.3440 1.3440 Model : g ( ) 0 1 X 1 2 X 2 3 X 3 Criterion Deviance Scaled Deviance Pearson Chi-Square

Scaled Pearson X2 Log Likelihood DF 147 147 147 147 Value 171.2162 171.2162 158.8281 158.8281 433.0160 Value/ DF 1.1647 1.1647 1.0805 1.0805 2 Test Statistic: X obs 2 ln L0 ln L1 2(410.8784 433.0160) 44.2752 2 2 Rejection Region ( 0.05) : X obs .05,3 7.815 P value: P 32 44.2752 0 Statistical output obtained from SAS PROC GENMOD Testing for Individual (Partial) Regression Coefficients H 0 : j 0 H A : j 0 ^ j Z Test Statistic : zobs ^

SE j ( Z ) P value : 2 P Z zobs 2 2 Test Statistic : X obs P value : P 2 2 1 ^ j ^ SE j 2 X obs 2 ^ 1 - Sided Tests : Confirm sign of j is correct, then " cut" P - value in half. NASCAR Caution Flag Example Parameter Intercept Drivers TrkLength Laps DF 1 1 1

1 Estimate -0.7963 0.0365 0.1145 0.0026 Std Error 0.4117 0.0125 0.1684 0.0008 Chi-Square 3.74 8.55 0.46 10.82 Pr>ChiSq 0.0531 0.0035 0.4966 0.0010 Conclude the following: Controlling for Track Length and Laps, as Drivers Cautions Controlling for Drivers and Laps, No association between Cautions and Track Length Controlling for Drivers and Track Length, as Laps Reduced Model: log(Crashes) = -0.6876+0.0428*Drivers+0.0021*Laps Cautions Testing Model Goodness-of-Fit Two Common Measures of Goodness of Fit: Pearsons Chi-Square Deviance

Both measures have approximate Chi-Square Distributions under the hypothesis that the current model is appropriate for fixed number of combinations of independent variables and large counts ^ y i i n Pearson's Chi - Square : X 2 ^ ^ i 1 V i n y Deviance: G 2 2 yi log ^ i i 1 i 2 ^ ^ ^ where V i i for Poisson Distribution NASCAR Caution Flags Example Null Model Criterion Pearson X2 Deviance

DF 150 150 Value 201.6050 215.4915 Value/ DF 1.3440 1.4366 P-Value 0.0032 0.0004 Full Model Criterion Pearson X2 Deviance DF 147 147 Value 158.8281 171.2162 Value/ DF 1.0805 1.1647 P-Value 0.2386 0.0838 Note that the null model clearly does not fit well, and the full model fails to reject the null hypothesis of the model being appropriate (however, we have many combinations of Laps, Track Length, and Drivers)

SAS Program options ps=54 ls=76; data one; input serrace 6-8 year 13-16 searace 23-24 drivers 31-32 trklength 34-40 laps 46-48 road 56 cautions 63-64 leadchng 71-72; cards; 1 1975 1 35 2.54 191 1 5 13 ... 151 1979 31 37 2.5 200 0 6 35 ; run; /* Data set one contains the data for analysis. Variable names and column specs are given in INPUT statement. I have included ony first and last observations */ /* The following model fits a Generalized Linear model, with poisson random component, and a constant mean: g(mu)=alpha is systematic component, g(mu)=log(mu) is the link function: mu=e**alpha */ proc genmod; model Cautions = / dist=poi link=log; run; /* The following model fits a Generalized Linear model, with poisson random component, g(mu)=alpha + beta1*drivers + beta2*trkength + beta3*laps is systematic component, g(mu)=log(mu) is the link function: mu=e**alpha + beta1*drivers + beta2*trkength + beta3*laps */

proc genmod; model Cautions = drivers trklength laps / dist=poi link=log; run; quit; SPSS Output Goodness-of-Fit Test Used when there are many distinct levels of explanatory variables Based on lumping together cases based on their predicted values into J (often 10 is used) groups Compares observed and expected counts by group based on Deviance and Pearson residuals. For Poisson model (where obs is observed, exp is expected): Pearson: ri = (obsi-expi)/expi X2=ri2 Deviance: di = (obsi* log(obsi/expi)) G2=2 di2 Degrees of Freedom: J- p-1 where p=#Predictor Variables NASCAR Caution Flags Example ^ i e Group 1 2 3 4 5 6 7 8 9 10 Fitted <3.50 3.50-3.80 3.80-4.08 4.08-4.25

4.25-4.42 4.42-5.15 5.15-5.50 5.50-6.25 6.25-6.70 >6.70 0.6876 0.0428 Di 0.0021Li #Races 15 14 18 20 12 17 15 15 14 11 #Crashes 37 60 72 68 51 100 88 91 94 63 Expected 46.05 50.37 71.24 84.03 52.35 81.39 78.19 87.40

90.81 78.46 Pearson -1.33 1.36 0.09 -1.75 -0.19 2.06 1.11 0.38 0.33 -1.75 Pearson X2 P-value 15.5119 0.0300 Note that there is evidence that the Poisson model does not provide a good fit Computational Approach e y Poisson Probability Mass Function : P (Y y ) y 0,1,2,... y! Systematic Component : g ( ) 0 1 X 1 2 X 2 3 X 3 Link Function : g ( ) log( ) e g ( ) e 0 1 X 1 2 X 2 3 X 3 For Subject i : g ( i ) 0 1 X 1i 2 X 2i 3 X 3i x'i where : 1 X x i 1i X 2i X 3i

x1' ' x2 X ' x n 0 1 2 3 y1 y Y 2 yn n Likelihood Function : L y1 ,..., yn e i iyi yi ! i 1 n l ln( L) e i 1 x'i n '

i yi x i 1 1 2 n ' ' exp e xi e x i yi ! i 1 n n ln y ! i i 1 1 X l x'i x i e yi x i yi i x i yi i 1i X 2i

X 3i yi Computational Approach 1 X l x'i x i e yi x i yi i x i yi i 1i X 2i X 3i 1 X l Setting 0 yi i 1i 0 X' (Y ) 0 X 2i X 3i ' ' 2l x i e x i yi x i x i e x i x'i X' WX where W diag ' ' Setting : G X' WX and g X' ( Y ) leads to the the estimate of via Newton - Raphson algorithm :

ln y ^ New ^ Old ^0 ^ Old ^ Old 0 G g with a reasonable staring vector of 0 0 with approximate large - sample estimated variance - covariance matrix : 1 ^ ^ ^ V G 1 X' W X 1

Recently Viewed Presentations

  • Notes on Hambrick, Ulln, and Mosings Scientific American

    Notes on Hambrick, Ulln, and Mosings Scientific American

    Commenting on a meta-analysis by Macnamara, Hambrick, and Oswald (2014, Psychological Science), Ericsson and Pool state in Peak that "[t]he major problem with this meta-analysis was that few of the studies the researchers examined were actually looking at the effects...
  • Introduction to Decision Analysis

    Introduction to Decision Analysis

    Which is a better: Spin 50 times and win $1000 if get 40 or more reds Spin 100 times and win $1000 if get 80 or more reds Sum of Draws For a random process producing real number values, we...
  • Erection , Lubrication, and Orgasm

    Erection , Lubrication, and Orgasm

    Erection , Lubrication, and Orgasm Where is erectile tissue located? Once these tissues are stimulated, parasympathetic nerve impulses release nitric oxide (a vasodilator). What effect would a vasodilator have? If stimulation is sufficient, vestibular glands secrete mucus into the vagina....
  • Parts of the Computer - Teacher.co.ke

    Parts of the Computer - Teacher.co.ke

    Parts of the Computer. By: Olivia Krause. Kaplan Elementary / Summer 2006. Monitor. A monitor is the computer. screen you look at and work on. A monitor looks like a TV. Monitor ON Button. The monitor ON button is the...
  • Customer Service - Weebly

    Customer Service - Weebly

    Customer Service & AIDET. Example of using AIDET: You are a host/hostess taking a patient's (Ms. Kaczmarek) dinner tray into her room…. A. cknowledge-- "Good evening!" I. ntroduce-- "I'm Brittany, your hostess this evening." D. uration-- "I'm sorry to disturb...
  • The comparative study about the defining quality elements at ...

    The comparative study about the defining quality elements at ...

    Materials and Methods. In the experiment were followed three varieties of vine with table grapes from the varietal conveyor cultivated in Romania - Muscat Hamburg cv., Afuz Ali cv., and Victoria cv., with different periods of maturation, two of these...
  • MDAC Introduction and Participant Satisfaction Survey ...

    MDAC Introduction and Participant Satisfaction Survey ...

    March 17, 2016. MDAC Introduction and Participant Satisfaction Survey Overview for SSVF Grantees. Name. Welcome. Thank you. Overview of process of capturing Veteran feedback
  • MSPI 101 - National Council Urban Indian Health (NCUIH)

    MSPI 101 - National Council Urban Indian Health (NCUIH)

    COL with Courtney Yarholar- Suicide Prevention Resource Center- included Garrett Lee Smith Suicide Prevention grantees NCUIH 2014-15 Main Objectives Provide MSPI informational webinar/training to all Urban Indian Health Programs.