Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Solving misspecification of model/ omitted variable bias, using Driscoll-Kraay standard errors (.xtscc)

    Dear honored Statalist experts,

    I already hat the chance to read a lot of your interesting discussions, posts and ideas. Now as I don't know how to overcome the misspecification of my stata model I write to you in seek of advice.

    I am evaluating the impact of gold price on the capital structure of 63 gold mining companies from Q1 2003 - Q4 2017. The panel data set is unbalanced because e.g. not all companies reported their Q4 2017.:
    Variable Def Obs Mean Std. Dev. Min Max
    id Gold miners 3780 131 18.18665 100 162
    gold ln(gold price) 3780 6.813 0.4902495 5.85 7.45
    tang_n Total Fixed Assets/ Total Assets 3729 0.5399061 0.2334447 0 0.99
    prof_n EBIT/Total Assets 3729 0.0046581 0.0717691 -0.89 0.37
    lev_n Total Debt/Total Assets 3726 0.145314 0.152689 -0.29 1.13
    growth_n (Total Assetst-Total Assetst-1) / Total Assetst-1 3713 0.0828764 0.4107697 -0.91 11.36
    risk_n (EBITt-EBITt-1)/EBITt-1 3715 0.194498 13.30551 -279.71 321.22
    sizeii_n ln(Total Assets) 3729 6.050724 2.141712 -1.69 11.2
    Steps conducted:
    • After intensive literature research I decided to build a basic model concluding all by literature repeatedly tested independent variables and than adding gold
    • Checked for outliers by using scatter plots --> There are a few but all seem reasonable which is the reason I decided not to drop them or the entire individum
    • Hausman test for fixed versus random effects model --> Rejected H0 -> FE model
    • Breusch-Pagan LM test for random effects versus OLS
      --> Rejected H0 -> RE model to be favoured over FE model
    • Wooldridge test for autocorrelation
      --> Rejected H0 -> Existency of autocorrelation
    • Modified Wald test for heteroskedasticity
      -->
      Rejected H0 -> Existency of heteroskedasticity
    • Friedman's as well as Pesaran's test for cross sectional dependence
      --> cross-
      sectional dependence
    • Decided to use Driscoll-Kraay standard errors (.xtscc) to overcome
      autocorrelation, heteroskedasticity and most important c
      ross-sectional dependence
    • Further I make use of time & indivium invariant dummy variables to adjust for potential omitted variables
    The issue I am facing now is that when applying the Shapiro-wilk-test and plotting residuals it seems that the assumptions of normal distribution is lagging. So I check for omitted variables by using the Ramsey RESET test (.ovtest) and linktest, concluding that there is a misspecification of my model. Do you have any idea if there is a model adjusting for omitted variables?
    Click image for larger version

Name:	Unbenannt.jpg
Views:	1
Size:	71.5 KB
ID:	1441931


  • #2
    Knut:
    some comments about your query:
    - you have (approximately) a N=T panel dataset; hence, you can also consider -xtgls-;
    - BP test for random effect should be conducted before -hausman-:
    - if you decide to stick with -xtreg- and you detected heteroskedasticity and/or autocorrelation, you should robust/cluster your standard errors before performing -hausman-;
    - -hausman- does not allow for non-default standard errors; heowever, you can check whether -re- specification holds via the user-.written programme -xtoverid- (type -search xtoverid- from within Stata to install it;
    - you have a pretty large sample: hence heteroskedasticity should not bite that hard (with 4 quarter*15 years=60 waves of data I would be indeed more concerned about autocorrelation); you are correct in visually inspecting the residual distribution, which is frankly leptokurtotic. At a very first glance, that shape may be influenced by the (weird) large dispersion of -risk_n- predictor. I would check for any error in data entry before considering any alternative data analysis strategy;
    - unfortunately, despite its ambitious goal (detecting omitted variable bias) RESET test implies no black/white magic about omitted predictors; that said, I would consider whether a quadratic relationship between one of your predictor and the dependent variable is allowed by your data.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,
      much appreciated that you took time to reply to my inquiry!
      • I thought .xtgls would require N>T, anyways by reading about .xtgls it seems that the applied method is quite similar to .xtscc. Do you have a personal preference here?
      • Running .xtoverid leads to P-value = 0.0038 -> I would stick to the -fe- model, as a side note when I change the the definition of size to ln(sales) both -hausman- as well as .xtoverid suggest -re- model (difference in p-values is >0.25).
      • As suggested I checked for error in data entry and unfortunately concluded that those figures are reasonable even if they represent significant outliers. Therefore I initially decided to keep those outliers but is seems that excluding/dropping greatest outliers could make sense.
      • I know that the process of squaring a predictor can help in case linearity is lagging. What exactly should I look for when doing this?
      Further I found the following statement:"Panel data allows us to eliminate the effects of unobserved variables, as long as they remain constant through time. However, if the unobserved variables change through time, panel data will not completely eliminate the bias." -> Would you agree that by including the dummies as explained in my initial post I at least get a good argumentation going if I can solve the problem?

      Many thanks and kind regards,
      Knut

      Comment


      • #4
        Knut:
        - if you look at -xtreg- and -xtgls-entry6 yiou will see that the first is for N>T panel datasets, whereas the latter is for T>N ones; I would go -xtscc-, provided that -re- specification is out of debate;
        - -fe- or -re- specification should be chosen in the light of the data generating process; I would skim through the literature of your research field an see whether changing the definition of -sise- the way you did is recommended;
        - ouliers are often a simple matter of fact; I would keep all the observation you collected if no data entry error came alive aftyer your inspection;
        - you should look at turning points after squaring. Any decent econometrics textbook covers that issue;
        - the statement you found reports on how fixed effect estimator actually works. Obviously, if heterogeneity lurks behind time-varying predictor, bias still remains.

        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hey Carlo,
          quick update (sorry for the late reply).
          • So far I want to stick to -xtscc, after testing for quadratic relations of my independent variables and y I found indeed turning points for some variables e.g. gold. However as a quadratic model is ambitious to be interpretated, I decided to stick to using log to transform independent variables if necessary e.g. as with gold:
          • 1.4
            L | *
            e | * *
            v | * *
            e |
            r | ** **
            a | * * * * ** **
            g | ** * * ** ** *** * *
            e | * * * ** ********* * *****
            | * ** * * * * * * *** ** ******* * ****
            | ** ** * * ***** * ***** ** ********* * *****
            | ** ** * * ***** * ***** ** ********* * *****
            | ** ** * * ***** * ***** ** ********* * *****
            | ** ** * * ***** * ***** ** ********* * *****
            | ** ** * * ***** * ***** ** ********* * *****
            0 + ** ** * * ***** * ***** ** ********* * *****
            +----------------------------------------------------------------+
            5.98 Gold Price 7.45
          • According to literature allowed to change the definition of size as I did, the problem was more in the data itself which still included firms with sales of "0" -> Cleared my data once again
          • Coming to omitted variables bias, I decided to included control variables, especially focusing on studies investigating on factors impacting the gold price because my research question is "impact on gold price movements", potentially control variables: CPI US (inflation), trade weighted USD (USD as major index currency), yield of US 10yrs bond (interest rates) and S&P 500 (performance of the equity market). The thing is that Pearson's r indicated a huge difference between level (e.g. 0.43 and its growth rate (e.g. -0.12) of a variable, making it difficult to decide which to use for the model. Existing literature is divided when it comes to definitions. In a time series regression I would perform an unit root test for this purpose but I could not find any information how to deal with this in panel data. Probably I am just getting confused.
          Your advice would be much appreciated.
          Best,
          Knut

          Comment


          • #6
            Knut:
            - so you decided to switch to a log-log regression model to measure the elasticity of the price of gold vs your predictors;
            - it's wise that the choice of your predictor is supported by previous researches (especially if you're intended to submit a paper to a technical journal in your research field);
            - -xtunitroot- might be what you're looking for.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              I have a similar question:

              I am evaluating the impact of options on the payout decision for firms in the period 2012-2016. The panel data set is unbalanced and I'm aware that the data could be incomplete.
              Variables N Mean Std. Dev. Min Max
              Dependent Variables:
              Repurchase Payout 708 0.0021 0.0098 0.0000 0.1899
              Dividend Payout 678 0.0323 0.0660 0.0000 0.9686
              Independent Variable:
              Options 782 0.0044 0.0297 0.0000 0.5684
              Control Variables:
              Free Cash Flow 778 -0.0215 0.2049 -2.3313 0.4926
              Leverage 794 0.2830 0.2352 0.0000 1.9068
              Financing Costs 800 21.5723 2.2830 15.2656 28.6068
              started by doing the following:
              xtset company-id year


              I want to know which model I should be using:
              • Conducted a BP test for RE vs OLS (xttest0)
                -> rejected H0 for the dividend variable (xtreg: Dividend= Options + Cash flow + Leverage + Financing costs)
                -> failed to reject H0 for the repurchase variable (xtreg: Repurchase= Options + Cash flow + Leverage + Financing costs)
                ---> Is it possible to use two different models for these regressions when they are based on the same dataset?
              • Additionally, I've visually inspected the residual distribution, to check for heteroskedasticity (as Carlo mentioned in another thread). With the following results:
                Skjermbilde 2018-11-22 kl. 15.10.29.pngSkjermbilde 2018-11-22 kl. 15.10.13.png
                --> How do I interpret these outputs (dividend to the left, repurchase to the right)? If there is evidens of heterosked. how do I fix it?
              • What other tests should I run in order to see if the assumptions hold?
              And how do I export the test results to word (preferably rtf-format)?

              All answers are appreciated
              Kind regards,
              Ola

              Comment


              • #8
                Ola:
                welcome to this forum.
                Please, start a new thread. Thanks.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thank you for your reply, Carlo. Here's the new thread:
                  https://www.statalist.org/forums/for...for-panel-data

                  Kind regards,
                  Ola

                  Comment

                  Working...
                  X