Applied Econometrics Applied Econometrics Second edition Dimitrios Asteriou and Stephen G. Hall Applied Econometrics MISSPECIFICATION 1. Ommiting Influential or Including NonInfluential Explanatory Variables 2. Various Functional Forms

3. Measurement Errors 4. Tests for Mispecification 5. Approaches in Choosing an Appropriate Model Applied Econometrics Learning Objectives 1. Understand the various forms of possible misspecification in the CLRM. 2. Appreciate the importance and learn the consequences of omitting influential variables in the CLRM. 3. Distinguish among the wide range of functional forms and understand

the meaning and interpretation of their coefficients. 4. Understand the importance of measurement errors in the data. 5. Perform misspecification tests using econometric software. 6. Understand the meaning of nested and non-nested models. 7. Be familiar with the concept of data mining and choose an appropriate econometric model. Applied Econometrics Omitting Influential Variables Omitting influential variables from a regression model causes these variables to become part of

the error term. Therefore one or more of the assumptions of the CLRM will be violated. Consider the population regression function: Y=1+2X2+ 3X3+u where 20 and 3 0, and assume this as the correct. Applied Econometrics Omitting Influential Variables However, we estimate the following Y=1+2X2+u

where X3 is wrongfully omitted. Then, the error term of this equation is: u= 3X3+e It is clear that the assumption that the error term has a zero mean is now violated: E(u)=E(3X3+e)=E(3X3)+E(e)= E(3X3) 0 Applied Econometrics Omitting Influential Variables Furthermore, if the excluded variable X3 happens to be correlated with X2 then the error term is

no longer independent of X2. This results to estimators of 2 and 3 to be biased and inconsistent. This is called omitted variable bias. Applied Econometrics Including Non-Influential Variables This is the opposite case. The correct model is: Y=1+2X2+u and we estimated: Y=1+2X2+ 3X3+e

where X3 is wrongly included in the model. Applied Econometrics Including Non-Influential Variables Since X3 does not belong to the correct model, its population coefficient should be equal to zero (i.e. 3=0). If 3=0 then none of the CLRM assumptions is violated and OLS estimators are both unbiased and consistent. However, it is unlikely that they are efficient.

If X2 is correlated with X3 then an additional unnecessary element of multicollinearity will be Applied Econometrics Omission and Inclusion at the same time In this case the correct model is: Y=1+2X2+ 3X3+v and we estimate: Y=1+2X2+ 4X4+w It should be easy now to understand the problems that this double mistake causes.

Applied Econometrics The Plug in Solution Sometimes it is possible to face omitted variable bias because a key variable that affects Y is not available. For example consider a model where the monthly salary of an individual is associated with Whether or not he/she is male/female. Years he/she has spent in education

Applied Econometrics The Plug in Solution Both of these factors can be quantified and included in the model. However, if we also assume that the salary level can be affected by the socio-economic environment in which each person was brought up, then this is hard to be measured in order to be included in the model: (salary)= 1+2(sex)+)+3(educ) +) +3(bac) +kground)

+u Applied Econometrics The Plug in Solution Not including the bac) +kground variable in the model leads to biased estimates of 1 and 2. Our major interest, however, is to get appropriate estimates for those two coefficients (i.e. we do not care that much for 3 because we will never get the appropriate coefficient for that). A way to resolve that, is to include an alternative

proxy variable for the omitted variable. Applied Econometrics The Plug in Solution For this example what we can use is family income. Family income is not of course exactly what we mean with background but it is definitely a variable that is highly correlated with that. Applied Econometrics

The Plug in Solution To illustrate this consider the model: Y=1+2X2+ 3X3+4X*4+u where X2 and X3 are observed, X*4 is unobserved. We know though that X*4=1+2X4+e Where an error term e should be included because there are not exactly the same and 1 is also included in order to allow them to be measured in a different scale. We need variables that are positively correlated (i.e. 2>0)

Applied Econometrics The Plug in Solution So we estimate: Y=1+2X2+ 3X3+4(1+2X4+e)+u = (1+ 41)+2X2+ 3X3+42X4+(4e+u) = a1 + 2X2+ 3X3+ a4X4+ w By estimating this model we do not get unbiased estimates for 1 and 4, but we get unbiased estimators for a1, 2, 3 and a4. Applied Econometrics

Various Functional Forms Linear Y=1+2X2 Linear-Log Y=1+2lnX2 Reciprocal Y=1+2 (1/XX2) Quadratic Interaction Log-Linear Double Log

Y=1+2X2 +3X22 Y=1+2X2 +3X2Z lnY=1+2X2 lnY=1+2lnX2 Applied Econometrics The Box-Cox Transformation The choice of functional form plays important role; thus, we need a formal test of comparing alternative models (functional forms). If we have the same dependent variable things

are easy: estimate both models and choose the one with the higher R2. However, if the dependent variables are different an immediate comparison is impossible. Applied Econometrics The Box-Cox Transformation Assume we have those two models: Y=1+2X2 and lnY=1+2lnX2 In such cases we need to scale the Y variable in such a way that we will be able to compare the

two models. The procedure that does that is called the BoxCox Transformation. Applied Econometrics The Box-Cox Transformation Step 1: Obtain the geometric mean of the sample Y values. Y=(Y1Y2Y3Yn)1/Xn=ex)+p[(1/Xn)lnY) Step 2: Transform the sample Y values by dividing each of them by Y obtained from step 1 to get: Y*=Yi/XY Step 3: Estimate both models with Y* as the dependent variable.

The equation with the lower RSS should be preferred. Step 4: If we want to check whether it is significantly better calculate (1/X2 n)ln(RSS2/XRSS1) and check with the chi-square distribution. RSS2 is the one with the lower. Applied Econometrics Measurement Errors Sometimes the data are not measured appropriately. We can have measurement errors either in the dependent variable or in the explanatory variables or both.

If it is in the dependent then we have larger variances of the OLS coefficients. Unavoidable. If it is in the explanatory variables, we have biased and inconsistent estimators. Totally wrong results. Applied Econometrics Tests for Misspecification We have the following tests: Test for Normality of the residuals The Ramsey RESET test Tests for Non-Nested Models

Applied Econometrics Normality of Residuals Step 1: Calculate the Jarque-Berra (JB) Statistic (given in Eviews) Step 2: Find the chi-square critical value from the corresponding tables. Step 3: If JB>chi-square critical reject the null hypothesis of normality. Applied Econometrics

The Ramsey Reset Test Step 1: Estimate the model that we think is correct and obtain the fitted values of Y, call them Y. Step 2: Estimate the model of step 1 again, this time including Y2 and Y3 as additional explanatory variables. Step 3: The model in step 1 is the restricted model and the model in step 2 is the unrestricted model. Calculate the F-statistic for these two models. Step 4: Compare the F-statistical with the F-critical and conclude (if F-stat>F-crit we reject the null of correct

specification. Applied Econometrics Tests for Non-Nested Models If we want to test models which are not nested then we can not use the F-statistic approach. Non-nested are the models in which neither equation is a special case of the other, in other words we dont have restricted and unrestricted models. Suppose for example that we have the following: Y=1+2X2 +3X3+u (1)

Y=1+2lnX2 +3lnX3+u (2) Applied Econometrics Tests for Non-Nested Models One approach (Mizon and Richard) suggests the estimation of a comprehensive model of the form: Y= 1+ 2X2 + 3X3+ 4lnX2 +5lnX3+e and then to apply an F-test for significance of 4 and 5 having as restricted model equation (1).

Applied Econometrics Tests for Non-Nested Models A second approach (Davidson and McKinnon) suggests that if model (1) is true then the fitted values of (2) should be insignificant in (1) and vice versa. So they suggest the estimation of Y= 1+ 2X2 +3X3+Y*+e where Y* is the fitted values of model (2). A simple t-test of the coefficient of Y* can conclude.

Applied Econometrics Choosing the Appropriate Model There are two major approaches The traditional view: Average Economic Regressions (AER) The Hendrys General to Specific Approach Applied Econometrics Choosing the Appropriate Model The AER essentially starts with a simple model

and then builds up the model as the situation demands. It is also called simple to specific. Two disadvantages: (a) Suffers from data mining. Only the final model is presented by the researcher. (b) The alterations to the original model are carried out in an arbitrary manner based on the beliefs of the researcher. Applied Econometrics Choosing the Appropriate Model

The Hendry approach starts with a general model that contains nested within it as special cases other simpler models and then with appropriate tests to narrow down the model to simpler ones. The model should be: (a) Data admissible, (b) Consistent with the theory (c) Use regressors that are not correlated with the error term (d) Exhibit parameter constancy (e) Exhibit data coherency (f) Encompasing, meaning to include all possible rival models