Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtgee AND repeated time values within panel

    Hello all,

    I'm hoping that I can get some help with understanding something. I have what I believe is a panel dataset with many firms each period that are operating in many states. I'm performing analysis where I'm including fixed effects for firm, state, and time, and it is my preference to use xtgee if possible in order to account for within group correlation. My question is... if I perform:

    xtset firm time

    Then I get the message that there are repeated time values within panel. I am aware that this is because there are multiple observations for each firm in each time period and accordingly it is because firm is not uniquely identified within each time period. Therefore, is it valid to only perform:

    xtset firm

    and go on with my analysis?

    e.g., xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust

    or would this not be appropriate?

    I considered instead creating unique ids for each firm-state combination, but my concern is that would in essence be tricking Stata to xtset the panelvar and timevar for me. e.g.,

    xtset firmstateid time

    then:

    xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust


    In this scenario, I don't think of the firm-state combination as necessarily a real higher order unit that exists, such as a 'firm' or 'state,' but instead it is an artifact of the structure of the dataset that I am identifying for analysis purposes, right? However, it does seem like the second approach adjusts for within group correlation of the firm-state observations over time (i.e., the idea that firm1's activity (the DV) in Texas at time 1 may be correlated with firm1's activity (the DV) in Texas at time 2), which I like and sounds like it could be right.

    Whereas the first approach just equally adjusts for within group correlations of within firm observations (regardless of state and time). Although, I believe it is less likely that firm1's activity in Texas at time 1 is correlated with firm 1's activity in Alaska at time 1.

    Am I thinking about this correctly? What would be the valid approach to analyze the data?

    I thank you all in advance for giving this a look



    Last edited by andrew rich; 18 Jul 2018, 01:03.

  • #2
    Andrew:
    if you do not plan to use time-series command such as lags and leads, you can safely -xtset- your data with -panelvar- (ie, -firm-) only.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo,

      Thank you for your quick reply. I am wondering if I can get clarification on something though.

      Would only performing 'xtset' on the panelvar (firm) be ignoring the structure of the data? For example, I believe it is less likely that firm1's activity (the DV) in Texas at time 1 is correlated with firm 1's activity in Alaska at time 1. However, I believe that firm1's activity (the DV) in Texas at time 1 is much more likely to be correlated with firm1's activity (the DV) in Texas at time 2. If I just use 'xtset firm' though, it would be treating firms observation across all states (50) and time periods as being in the same cluster though right? Regardless of their correlation, e.g. (approach 1),

      xtset firm

      xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity)


      in which case perhaps it is more appropriate to create unique ids for each firm-state combination. This would appropriately treat firm1's activity in Texas at time 1 as correlated with firm1's activity in Texas at time 2 right? But my concern is that would in essence be tricking Stata to xtset the panelvar and timevar for me even if this is not a valid way of treating the data. e.g (approach 2).,

      xtset firmstateid time

      xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity)


      Do it not matter if Stata treats all firm observations across state and time as being in the same group? Even if some observations within that group are more correlated with each other than others (i.e., approach 1)? Or am I valid to think it would be more correct to each firm-state cluster (firm1-Texas) as related to the same firm-state cluster at a different time point (i.e., firm1-Texas time1 and firm1-Texas time2) as in approach 2 that I laid out?

      What would be the way to go and why?

      Your help is very much appreciated!

      Thanks




      Comment


      • #4
        Andrew:
        I do not use -xtgee- and cannot give you specific advice on it.
        However, looking at the following toy-example with -xtreg-, the way you -xtset- your data does not seem to influence your results (provided, as per my previous reply, that you do not plan to use time-series commands):
        Code:
        use "http://www.stata-press.com/data/r15/nlswork.dta"
        
        . xtset idcode
               panel variable:  idcode (unbalanced)
        
        . xtreg ln_wage hours tenure, robust
        
        Random-effects GLS regression                   Number of obs     =     28,036
        Group variable: idcode                          Number of groups  =      4,698
        
        R-sq:                                           Obs per group:
             within  = 0.0972                                         min =          1
             between = 0.1997                                         avg =        6.0
             overall = 0.1379                                         max =         15
        
                                                        Wald chi2(2)      =    1607.87
        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
        
                                     (Std. Err. adjusted for 4,698 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               hours |   .0002921   .0004498     0.65   0.516    -.0005894    .0011736
              tenure |   .0375315   .0009405    39.90   0.000     .0356881    .0393749
               _cons |   1.545542   .0175164    88.23   0.000      1.51121    1.579873
        -------------+----------------------------------------------------------------
             sigma_u |  .33795389
             sigma_e |  .30365406
                 rho |  .55330682   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . xtset idcode year
               panel variable:  idcode (unbalanced)
                time variable:  year, 68 to 88, but with gaps
                        delta:  1 unit
        
        . xtreg ln_wage hours tenure, robust
        
        Random-effects GLS regression                   Number of obs     =     28,036
        Group variable: idcode                          Number of groups  =      4,698
        
        R-sq:                                           Obs per group:
             within  = 0.0972                                         min =          1
             between = 0.1997                                         avg =        6.0
             overall = 0.1379                                         max =         15
        
                                                        Wald chi2(2)      =    1607.87
        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
        
                                     (Std. Err. adjusted for 4,698 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               hours |   .0002921   .0004498     0.65   0.516    -.0005894    .0011736
              tenure |   .0375315   .0009405    39.90   0.000     .0356881    .0393749
               _cons |   1.545542   .0175164    88.23   0.000      1.51121    1.579873
        -------------+----------------------------------------------------------------
             sigma_u |  .33795389
             sigma_e |  .30365406
                 rho |  .55330682   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        .
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Hi Carlo, thank you for your quick response.

          I understand that it does not influence results whether I use xtset firmid time vs. xtset id in the context of xtreg, but I was hoping to also get guidance regarding my particular scenario, whether it makes sense to create a unique id for each firm-state combination and xtset the data that way (i.e., xtset firmstateid time)?

          Thank you

          Comment


          • #6
            Andrew:
            sorry if my previous reply did not fully address your question(s).

            As you report both pros and cons for the different approaches you compared, I would sponsor keeping things simple(r) and support:
            Code:
            xtset firm
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Thank you Carlo, appreciate your input.

              Comment

              Working...
              X