Solving misspecification of model/ omitted variable bias, using Driscoll-Kraay standard errors (.xtscc)

Knut Ehrhardt

Join Date: Apr 2018
Posts: 4

Solving misspecification of model/ omitted variable bias, using Driscoll-Kraay standard errors (.xtscc)

29 Apr 2018, 12:32

Dear honored Statalist experts,

I already hat the chance to read a lot of your interesting discussions, posts and ideas. Now as I don't know how to overcome the misspecification of my stata model I write to you in seek of advice.

I am evaluating the impact of gold price on the capital structure of 63 gold mining companies from Q1 2003 - Q4 2017. The panel data set is unbalanced because e.g. not all companies reported their Q4 2017.:

Variable	Def	Obs	Mean	Std. Dev.	Min	Max

id	Gold miners	3780	131	18.18665	100	162
gold	ln(gold price)	3780	6.813	0.4902495	5.85	7.45
tang_n	Total Fixed Assets/ Total Assets	3729	0.5399061	0.2334447	0	0.99
prof_n	EBIT/Total Assets	3729	0.0046581	0.0717691	-0.89	0.37
lev_n	Total Debt/Total Assets	3726	0.145314	0.152689	-0.29	1.13

growth_n	(Total Assets_t-Total Assets_t-1) / Total Assets_t-1	3713	0.0828764	0.4107697	-0.91	11.36
risk_n	(EBIT_t-EBIT_t-1)/EBIT_t-1	3715	0.194498	13.30551	-279.71	321.22
sizeii_n	ln(Total Assets)	3729	6.050724	2.141712	-1.69	11.2

Steps conducted:

After intensive literature research I decided to build a basic model concluding all by literature repeatedly tested independent variables and than adding gold
Checked for outliers by using scatter plots --> There are a few but all seem reasonable which is the reason I decided not to drop them or the entire individum
Hausman test for fixed versus random effects model --> Rejected H0 -> FE model
Breusch-Pagan LM test for random effects versus OLS
--> Rejected H0 -> RE model to be favoured over FE model
Wooldridge test for autocorrelation
--> Rejected H0 -> Existency of autocorrelation
Modified Wald test for heteroskedasticity
-->
Rejected H0 -> Existency of heteroskedasticity
Friedman's as well as Pesaran's test for cross sectional dependence
--> cross-
sectional dependence
Decided to use Driscoll-Kraay standard errors (.xtscc) to overcome
autocorrelation, heteroskedasticity and most important c
ross-sectional dependence
Further I make use of time & indivium invariant dummy variables to adjust for potential omitted variables

The issue I am facing now is that when applying the Shapiro-wilk-test and plotting residuals it seems that the assumptions of normal distribution is lagging. So I check for omitted variables by using the Ramsey RESET test (.ovtest) and linktest, concluding that there is a misspecification of my model. Do you have any idea if there is a model adjusting for omitted variables?

Click image for larger version

Name: Unbenannt.jpg
Views: 1
Size: 71.5 KB
ID: 1441931

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#2

30 Apr 2018, 04:38

Knut:
some comments about your query:
- you have (approximately) a N=T panel dataset; hence, you can also consider -xtgls-;
- BP test for random effect should be conducted before -hausman-:
- if you decide to stick with -xtreg- and you detected heteroskedasticity and/or autocorrelation, you should robust/cluster your standard errors before performing -hausman-;
- -hausman- does not allow for non-default standard errors; heowever, you can check whether -re- specification holds via the user-.written programme -xtoverid- (type -search xtoverid- from within Stata to install it;
- you have a pretty large sample: hence heteroskedasticity should not bite that hard (with 4 quarter*15 years=60 waves of data I would be indeed more concerned about autocorrelation); you are correct in visually inspecting the residual distribution, which is frankly leptokurtotic. At a very first glance, that shape may be influenced by the (weird) large dispersion of -risk_n- predictor. I would check for any error in data entry before considering any alternative data analysis strategy;
- unfortunately, despite its ambitious goal (detecting omitted variable bias) RESET test implies no black/white magic about omitted predictors; that said, I would consider whether a quadratic relationship between one of your predictor and the dependent variable is allowed by your data.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Knut Ehrhardt

Join Date: Apr 2018

Posts: 4
#3

01 May 2018, 04:49

Dear Carlo,
much appreciated that you took time to reply to my inquiry!
I thought .xtgls would require N>T, anyways by reading about .xtgls it seems that the applied method is quite similar to .xtscc. Do you have a personal preference here?

Running .xtoverid leads to P-value = 0.0038 -> I would stick to the -fe- model, as a side note when I change the the definition of size to ln(sales) both -hausman- as well as .xtoverid suggest -re- model (difference in p-values is >0.25).

As suggested I checked for error in data entry and unfortunately concluded that those figures are reasonable even if they represent significant outliers. Therefore I initially decided to keep those outliers but is seems that excluding/dropping greatest outliers could make sense.

I know that the process of squaring a predictor can help in case linearity is lagging. What exactly should I look for when doing this?

Further I found the following statement:"Panel data allows us to eliminate the effects of unobserved variables, as long as they remain constant through time. However, if the unobserved variables change through time, panel data will not completely eliminate the bias." -> Would you agree that by including the dummies as explained in my initial post I at least get a good argumentation going if I can solve the problem?

Many thanks and kind regards,
Knut
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

01 May 2018, 07:09

Knut:
- if you look at -xtreg- and -xtgls-entry6 yiou will see that the first is for N>T panel datasets, whereas the latter is for T>N ones; I would go -xtscc-, provided that -re- specification is out of debate;
- -fe- or -re- specification should be chosen in the light of the data generating process; I would skim through the literature of your research field an see whether changing the definition of -sise- the way you did is recommended;
- ouliers are often a simple matter of fact; I would keep all the observation you collected if no data entry error came alive aftyer your inspection;
- you should look at turning points after squaring. Any decent econometrics textbook covers that issue;
- the statement you found reports on how fixed effect estimator actually works. Obviously, if heterogeneity lurks behind time-varying predictor, bias still remains.

Kind regards,
Carlo
(Stata 19.0)
Comment
Knut Ehrhardt

Join Date: Apr 2018

Posts: 4
#5

17 May 2018, 11:38

Hey Carlo,
quick update (sorry for the late reply).
So far I want to stick to -xtscc, after testing for quadratic relations of my independent variables and y I found indeed turning points for some variables e.g. gold. However as a quadratic model is ambitious to be interpretated, I decided to stick to using log to transform independent variables if necessary e.g. as with gold:

1.4
L | *
e | * *
v | * *
e |
r | ** **
a | * * * * ** **
g | ** * * ** ** *** * *
e | * * * ** ********* * *****
| * ** * * * * * * *** ** ******* * ****
| ** ** * * ***** * ***** ** ********* * *****
| ** ** * * ***** * ***** ** ********* * *****
| ** ** * * ***** * ***** ** ********* * *****
| ** ** * * ***** * ***** ** ********* * *****
| ** ** * * ***** * ***** ** ********* * *****
0 + ** ** * * ***** * ***** ** ********* * *****
+----------------------------------------------------------------+
5.98 Gold Price 7.45

According to literature allowed to change the definition of size as I did, the problem was more in the data itself which still included firms with sales of "0" -> Cleared my data once again

Coming to omitted variables bias, I decided to included control variables, especially focusing on studies investigating on factors impacting the gold price because my research question is "impact on gold price movements", potentially control variables: CPI US (inflation), trade weighted USD (USD as major index currency), yield of US 10yrs bond (interest rates) and S&P 500 (performance of the equity market). The thing is that Pearson's r indicated a huge difference between level (e.g. 0.43 and its growth rate (e.g. -0.12) of a variable, making it difficult to decide which to use for the model. Existing literature is divided when it comes to definitions. In a time series regression I would perform an unit root test for this purpose but I could not find any information how to deal with this in panel data. Probably I am just getting confused.

Your advice would be much appreciated.
Best,
Knut
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#6

17 May 2018, 11:51

Knut:
- so you decided to switch to a log-log regression model to measure the elasticity of the price of gold vs your predictors;
- it's wise that the choice of your predictor is supported by previous researches (especially if you're intended to submit a paper to a technical journal in your research field);
- -xtunitroot- might be what you're looking for.

Kind regards,
Carlo
(Stata 19.0)
Comment

Ola Hansen

Join Date: Nov 2018
Posts: 3

22 Nov 2018, 07:12

I have a similar question:

I am evaluating the impact of options on the payout decision for firms in the period 2012-2016. The panel data set is unbalanced and I'm aware that the data could be incomplete.

Variables	N	Mean	Std. Dev.	Min	Max
Dependent Variables:
Repurchase Payout	708	0.0021	0.0098	0.0000	0.1899
Dividend Payout	678	0.0323	0.0660	0.0000	0.9686
Independent Variable:
Options	782	0.0044	0.0297	0.0000	0.5684
Control Variables:
Free Cash Flow	778	-0.0215	0.2049	-2.3313	0.4926
Leverage	794	0.2830	0.2352	0.0000	1.9068
Financing Costs	800	21.5723	2.2830	15.2656	28.6068

started by doing the following:
xtset company-id year

I want to know which model I should be using:

Conducted a BP test for RE vs OLS (xttest0)
-> rejected H0 for the dividend variable (xtreg: Dividend= Options + Cash flow + Leverage + Financing costs)
-> failed to reject H0 for the repurchase variable (xtreg: Repurchase= Options + Cash flow + Leverage + Financing costs)
---> Is it possible to use two different models for these regressions when they are based on the same dataset?
Additionally, I've visually inspected the residual distribution, to check for heteroskedasticity (as Carlo mentioned in another thread). With the following results:

--> How do I interpret these outputs (dividend to the left, repurchase to the right)? If there is evidens of heterosked. how do I fix it?
What other tests should I run in order to see if the assumptions hold?

And how do I export the test results to word (preferably rtf-format)?

All answers are appreciated
Kind regards,
Ola

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#8

22 Nov 2018, 10:57

Ola:
welcome to this forum.
Please, start a new thread. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Ola Hansen

Join Date: Nov 2018

Posts: 3
#9

22 Nov 2018, 11:22

Thank you for your reply, Carlo. Here's the new thread:
https://www.statalist.org/forums/for...for-panel-data

Kind regards,
Ola
Comment

Announcement

Solving misspecification of model/ omitted variable bias, using Driscoll-Kraay standard errors (.xtscc)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment