Assessing negative dummies in panel data re

Joakim Eriksson

Join Date: Feb 2024

Posts: 13
#1

Assessing negative dummies in panel data re

14 Feb 2024, 10:12

Hi members!

We ask assistance in reanalyzing our panel data to assess the hypothetical impact of a Sovereign Wealth Fund (SWF) on Sweden's financial balance sheet from 1993 to 2022.
Our study aims to evaluate the effect of the SWF on the Swedish government's balance sheet, which we've designated as the dependent variable, with the SWF considered an independent variable.

Our dataset includes three variations of the government's balance sheet:
1. The "actual" balance sheet.
2. The "actual balance sheet" plus a hypothetical SWF, assuming no SWF volatility.
3. The "actual balance sheet" plus a hypothetical SWF, allowing SWF volatility.

However, we're encountering issues with our findings. The dummy variables yield negative outcomes, while the SWF variable demonstrates positive effects.
Additionally, we've observed that both sigma_u and rho are zero, which typically indicates analytical problems.

Here is our STATA output from the regression:

We also did a check for multicollinearity:

Given the strong suggestion of multicollinearity from the dummy variables, we tried to drop these and test again.

We're seeking your opinion on these results.
Do you believe there are fundamental errors, or might we be overlooking important considerations?
Specifically, what are your thoughts on the dummy variables? Are they contributing meaningful insights, or are they merely complicating the analysis?
Tags: negative dummy, SWF
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#2

14 Feb 2024, 10:25

Joakim:
two comments about your query:
1) you seem to have no goup-wise effect (sigma_u=0);
2) you're dealing with a T>N panel dataset. I would leave -xtreg- and consider -xtregar- or -xtgls- instead.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Joakim Eriksson

Join Date: Feb 2024

Posts: 13
#3

15 Feb 2024, 05:59

Hey Carlo! Thanks for responding so quick.

We ran you suggestions and got the following results, what do you think?

Sincerely
Joakim Eriksson
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#4

15 Feb 2024, 08:45

Joakim:
1) your models are not comparable due to the different structure of the epsilon correlation, that has bearing on standard errors;
2) you have too few observations to perform a reliable panel data regression (in addition, your first regression does not show any evidence of group-wise effect). If you cannot increase your sample size, I would stick with a pooled OLS

Kind regards,
Carlo
(StataNow 18.5)
Comment
Joakim Eriksson

Join Date: Feb 2024

Posts: 13
#5

15 Feb 2024, 10:21

Hey Carlo, thanks for the answer.

When we do a pooled OLS without the dummies we get this result. Does not look that good... What do you think?

We also run a pooled OLS regression with the dummies but that resulted in the same coefficient as the previous picture (xtgls).
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17606

15 Feb 2024, 12:22

Joakim:
1) the reported OLS is not pooled; it simply a (false, because yo have panel data, but Stata does not know it) cross-sectional OLS that takes heteroskedasticity (but not autocorrelation; -regress- and -xtreg- differ on this respect; see Stata .pdf manual related entries for more details);
2) the equivalence of the coefficients estimated by -regress- and -xtgls- makes sense (the standard errors differ, though; please also note that clustering with 3 units does not make any sense):

Code:

. use "https://www.stata-press.com/data/r18/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. regress ln_wage c.age##c.age if idcode<=3

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(2, 36)        =      7.26
       Model |  1.48752894         2  .743764471   Prob > F        =    0.0022
    Residual |  3.68821002        36  .102450278   R-squared       =    0.2874
-------------+----------------------------------   Adj R-squared   =    0.2478
       Total |  5.17573896        38  .136203657   Root MSE        =    .32008

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .2270112   .0721084     3.15   0.003     .0807687    .3732538
             |
 c.age#c.age |  -.0035377   .0012214    -2.90   0.006    -.0060149   -.0010605
             |
       _cons |  -1.688438   1.024512    -1.65   0.108    -3.766245    .3893681
------------------------------------------------------------------------------

. xtgls ln_wage c.age##c.age if idcode<=3

Cross-sectional time-series FGLS regression

Coefficients:  generalized least squares
Panels:        homoskedastic
Correlation:   no autocorrelation

Estimated covariances      =         1          Number of obs     =         39
Estimated autocorrelations =         0          Number of groups  =          3
Estimated coefficients     =         3          Obs per group:
                                                              min =         12
                                                              avg =         13
                                                              max =         15
                                                Wald chi2(2)      =      15.73
Log likelihood             = -9.349405          Prob > chi2       =     0.0004

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .2270112   .0692795     3.28   0.001      .091226    .3627965
             |
 c.age#c.age |  -.0035377   .0011735    -3.01   0.003    -.0058377   -.0012377
             |
       _cons |  -1.688438   .9843193    -1.72   0.086    -3.617669    .2407919
------------------------------------------------------------------------------

3) again, your main problem is the too limited sample size, that basically makes any inferential procedure unfeasible.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Joakim Eriksson

Join Date: Feb 2024

Posts: 13
#7

16 Feb 2024, 03:51

Hey Carlo, thank you for so much help.

Do we run the pooled OLS correctly? What do you think?

Is it feasible or should we just accept that this will not be possible because of our data limitation and abandon our analysis? Perhaps do an Difference-in-Difference?

Thank you so much.
Sincerely
Joakim
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#8

16 Feb 2024, 08:42

Joakim:
1) your pooled OLS code is correct, but if the number of your panels is <30 you cannot safely cluster your standard errors (and this makes the pooled OLS unfeasible);
2) therefore, if you cannot increase your sample size, you are better off with sticking with descriptive statistics, explaining in your research report why your dataset does not allow you to perform inferential statistics.

Kind regards,
Carlo
(StataNow 18.5)
Comment

Announcement