xtgee AND repeated time values within panel

andrew rich

Join Date: Aug 2016

Posts: 35
#1

xtgee AND repeated time values within panel

18 Jul 2018, 00:35

Hello all,

I'm hoping that I can get some help with understanding something. I have what I believe is a panel dataset with many firms each period that are operating in many states. I'm performing analysis where I'm including fixed effects for firm, state, and time, and it is my preference to use xtgee if possible in order to account for within group correlation. My question is... if I perform:

xtset firm time

Then I get the message that there are repeated time values within panel. I am aware that this is because there are multiple observations for each firm in each time period and accordingly it is because firm is not uniquely identified within each time period. Therefore, is it valid to only perform:

xtset firm

and go on with my analysis?

e.g., xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust

or would this not be appropriate?

I considered instead creating unique ids for each firm-state combination, but my concern is that would in essence be tricking Stata to xtset the panelvar and timevar for me. e.g.,

xtset firmstateid time

then:

xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust

In this scenario, I don't think of the firm-state combination as necessarily a real higher order unit that exists, such as a 'firm' or 'state,' but instead it is an artifact of the structure of the dataset that I am identifying for analysis purposes, right? However, it does seem like the second approach adjusts for within group correlation of the firm-state observations over time (i.e., the idea that firm1's activity (the DV) in Texas at time 1 may be correlated with firm1's activity (the DV) in Texas at time 2), which I like and sounds like it could be right.

Whereas the first approach just equally adjusts for within group correlations of within firm observations (regardless of state and time). Although, I believe it is less likely that firm1's activity in Texas at time 1 is correlated with firm 1's activity in Alaska at time 1.

Am I thinking about this correctly? What would be the valid approach to analyze the data?

I thank you all in advance for giving this a look

Last edited by andrew rich; 18 Jul 2018, 01:03.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#2

18 Jul 2018, 00:51

Andrew:
if you do not plan to use time-series command such as lags and leads, you can safely -xtset- your data with -panelvar- (ie, -firm-) only.

Kind regards,
Carlo
(StataNow 18.5)
Comment
andrew rich

Join Date: Aug 2016

Posts: 35
#3

18 Jul 2018, 07:56

Hi Carlo,

Thank you for your quick reply. I am wondering if I can get clarification on something though.

Would only performing 'xtset' on the panelvar (firm) be ignoring the structure of the data? For example, I believe it is less likely that firm1's activity (the DV) in Texas at time 1 is correlated with firm 1's activity in Alaska at time 1. However, I believe that firm1's activity (the DV) in Texas at time 1 is much more likely to be correlated with firm1's activity (the DV) in Texas at time 2. If I just use 'xtset firm' though, it would be treating firms observation across all states (50) and time periods as being in the same cluster though right? Regardless of their correlation, e.g. (approach 1),

xtset firm

xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity)

in which case perhaps it is more appropriate to create unique ids for each firm-state combination. This would appropriately treat firm1's activity in Texas at time 1 as correlated with firm1's activity in Texas at time 2 right? But my concern is that would in essence be tricking Stata to xtset the panelvar and timevar for me even if this is not a valid way of treating the data. e.g (approach 2).,

xtset firmstateid time

xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity)

Do it not matter if Stata treats all firm observations across state and time as being in the same group? Even if some observations within that group are more correlated with each other than others (i.e., approach 1)? Or am I valid to think it would be more correct to each firm-state cluster (firm1-Texas) as related to the same firm-state cluster at a different time point (i.e., firm1-Texas time1 and firm1-Texas time2) as in approach 2 that I laid out?

What would be the way to go and why?

Your help is very much appreciated!

Thanks
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17672

18 Jul 2018, 08:10

Andrew:
I do not use -xtgee- and cannot give you specific advice on it.
However, looking at the following toy-example with -xtreg-, the way you -xtset- your data does not seem to influence your results (provided, as per my previous reply, that you do not plan to use time-series commands):

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"

. xtset idcode
       panel variable:  idcode (unbalanced)

. xtreg ln_wage hours tenure, robust

Random-effects GLS regression                   Number of obs     =     28,036
Group variable: idcode                          Number of groups  =      4,698

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.1997                                         avg =        6.0
     overall = 0.1379                                         max =         15

                                                Wald chi2(2)      =    1607.87
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,698 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hours |   .0002921   .0004498     0.65   0.516    -.0005894    .0011736
      tenure |   .0375315   .0009405    39.90   0.000     .0356881    .0393749
       _cons |   1.545542   .0175164    88.23   0.000      1.51121    1.579873
-------------+----------------------------------------------------------------
     sigma_u |  .33795389
     sigma_e |  .30365406
         rho |  .55330682   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtset idcode year
       panel variable:  idcode (unbalanced)
        time variable:  year, 68 to 88, but with gaps
                delta:  1 unit

. xtreg ln_wage hours tenure, robust

Random-effects GLS regression                   Number of obs     =     28,036
Group variable: idcode                          Number of groups  =      4,698

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.1997                                         avg =        6.0
     overall = 0.1379                                         max =         15

                                                Wald chi2(2)      =    1607.87
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,698 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hours |   .0002921   .0004498     0.65   0.516    -.0005894    .0011736
      tenure |   .0375315   .0009405    39.90   0.000     .0356881    .0393749
       _cons |   1.545542   .0175164    88.23   0.000      1.51121    1.579873
-------------+----------------------------------------------------------------
     sigma_u |  .33795389
     sigma_e |  .30365406
         rho |  .55330682   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(StataNow 18.5)

Comment

andrew rich

Join Date: Aug 2016

Posts: 35
#5

18 Jul 2018, 08:33

Hi Carlo, thank you for your quick response.

I understand that it does not influence results whether I use xtset firmid time vs. xtset id in the context of xtreg, but I was hoping to also get guidance regarding my particular scenario, whether it makes sense to create a unique id for each firm-state combination and xtset the data that way (i.e., xtset firmstateid time)?

Thank you
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#6

18 Jul 2018, 11:54

Andrew:
sorry if my previous reply did not fully address your question(s).

As you report both pros and cons for the different approaches you compared, I would sponsor keeping things simple(r) and support:

Code:

xtset firm

Kind regards,
Carlo
(StataNow 18.5)
Comment
andrew rich

Join Date: Aug 2016

Posts: 35
#7

19 Jul 2018, 07:59

Thank you Carlo, appreciate your input.
Comment

Announcement

xtgee AND repeated time values within panel

Comment

Comment

Comment

Comment

Comment

Comment