Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • BGLW Test for Attrition Is Exit from Sample Random

    Hi all,

    A dataex extract of my data is at the end of this post.

    I have a panel dataset which was generated by sending out surveys out to firms every two months (e.g. March 2021, May 2021, July 2021, etc.). The goal is to perform two-way fixed effects estimation with these data.

    The surveys ran from November 2020 to March 2022. Respondents were not obligated to respond to our surveys, therefore most respondents have an erratic response pattern, creating a very unbalanced panel. Furthermore, to add to the confusion, certain respondents joined in later months than others. A short example of what I mean:

    - Respondent A responds in March 2021, May 2021, July 2021, September 2021

    - Respondent B responds in March 2021, May 2021

    - Respondent C first responds in March 2021, skips May and July 2021, and then responds again in September 2021.

    I would like to find out whether the time of exit from our sample (e.g. May 2021 for respondent B and March 2021 for respondent C, September 2021 for respondent A) is random with respect to the dependent variable, conditional on all the other regressors and fixed effects. For this, I would like to run a BGLW (Becketti et al., 1988) test.

    I have run a two-way fixed-effects regression like
    Code:
    xtreg y x1 x2 x3 i.time, fe cluster(ID)
    and have saved the estimation sample, called full_est_sample (a binary variable indicating whether each observation was used in the estimation).

    I would like to create a dummy capturing whether a respondent later attrites. I am not sure if the strategy or the code are correct (comments are more than welcome!):

    1. Code a dummy called exit, equal to 1 if a respondent is not observed in the following survey wave (say period t+1), given that they are observed in this period t. 0 otherwise.

    I tried the following code, which did not give me what I wanted as it did not equal 1 each time a respondent left the sample in the next wave:

    Code:
    bys ID: g attrition=cond(full_est_sample[_n]==1 & full_est_sample[_n+1]==0,1,0)
    Please could someone let me know of the mistake I have made and how to correct it? The data are xtset so I have tried replacing [_n+1] with F., however I had the same issue so I must have made a mistake.

    2. Regress the initial value of y on the initial values of the covariates and on the dummy above. Should time fixed-effects be included in this regression? Is this the correct method for the BGLW test?


    Code:
    input float(ID yearmonth y x1 x2 x3 full_est_sample_snskills)
     4 744  0         0 0 0 1
     4 745  .         0 0 0 0
     8 735  .         0 0 0 0
     8 738  0         0 0 0 1
     8 739  .         0 0 0 0
     8 740  0         0 0 0 1
     8 741  .         0 0 0 0
     8 742  0         0 0 0 1
    15 729  0         0 0 0 0
    15 731  0         0 0 0 0
    16 733  .         1 0 0 0
    16 734 -1         0 0 0 1
    16 737  .         1 0 0 0
    16 740  0         0 0 0 1
    20 733  .         1 0 0 0
    20 736  0  .8068182 0 0 1
    20 739  .  .9302326 0 0 0
    20 742 -1         1 0 0 1
    20 747  .         0 0 0 0
    26 730  0         0 0 0 1
    27 725  .         0 0 0 0
    27 726  .         0 0 0 0
    27 728  .         0 0 0 0
    27 733  .         0 0 0 0
    27 736  0         0 0 0 1
    27 738  0         0 0 0 1
    27 739  .         0 0 0 0
    27 741  .         0 0 0 0
    27 747  .         0 0 0 0
    35 734  0  .8333333 0 0 1
    47 735  .         0 0 0 0
    47 738  0         0 0 0 1
    47 745  .         0 0 0 0
    47 746 -1         0 0 0 1
    47 747  .         0 0 0 0
    48 744  0         0 0 0 1
    50 746 -1         1 0 0 1
    58 744 -1         0 0 0 0
    66 734  1      .125 0 0 1
    66 735  .      .125 0 0 0
    66 736  1         0 0 0 1
    66 737  . .13333334 0 0 0
    66 739  .         0 0 0 0
    66 740  1 .14285715 0 0 1
    66 744  0         0 0 0 1
    68 726  .         0 0 0 0
    68 733  .         0 0 0 0
    70 744  0         0 0 0 0
    70 745  .         0 0 0 0
    71 734  0         0 0 0 1
    71 736  0         0 0 0 1
    73 734 -1         1 0 0 1
    73 735  .         1 0 0 0
    73 736 -1         1 0 0 1
    73 737  .         1 0 0 0
    73 738 -1         0 0 0 1
    73 739  .         0 0 0 0
    73 740 -1         0 0 0 1
    73 741  .         0 0 0 0
    73 742 -1         0 0 0 1
    73 743  .         0 0 0 0
    73 744 -1         0 0 0 1
    73 745  .         0 0 0 0
    73 746 -1         0 0 0 1
    73 747  .         0 0 0 0
    74 744  0         0 0 0 1
    75 734 -1         1 0 0 1
    75 735  .         1 0 0 0
    77 728  .         0 0 0 0
    77 730  0         0 0 0 1
    77 731  0         0 0 0 1
    77 732  0         0 0 0 1
    77 733  .         0 0 0 0
    77 734 -1         0 0 0 1
    77 735  .         0 0 0 0
    77 736  0         0 0 0 1
    77 737  .         0 0 0 0
    77 738  0         0 0 0 1
    77 739  .         0 0 0 0
    77 740  0         0 0 0 1
    77 742 -1         0 0 0 1
    77 744  0         0 0 0 1
    77 745  .         0 0 0 0
    77 746 -1         0 0 0 1
    77 747  .         0 0 0 0
    79 726  .         0 0 0 0
    79 744  0         0 0 0 1
    83 745  .         0 0 0 0
    86 734  0         0 0 0 1
    91 726  .         0 0 0 0
    91 728  .         0 0 0 0
    91 736  0         0 0 0 1
    92 744  0         0 0 0 1
    92 745  .         0 0 0 0
    94 734 -1         1 0 0 1
    95 726  .         0 0 0 0
    96 726  .         1 0 0 0
    97 733  .         1 0 0 0
    97 737  .         1 0 0 0
    98 736  0  .3333333 0 0 1

    Last edited by Maxence Morlet; 28 Apr 2023, 11:46. Reason: EDIT: one of my questions was badly formulated, I've corrected it.

  • #2
    Here is an easy way to implement a similar test if you're willing to expand the rows in the data set so that it is nominally "balanced." So each identifier should have the same list of time periods. The months where there's not data just gets filled in with missing values. Then, define the complete cases selection indicator, s. Thiis is the "all or nothing" indicator. Then simply use

    xtreg y x1 ... xK i.yearmonth L.s F.s, fe vce(cluster ID)

    I discuss this possibility in my 1995 Journal of Econometrics paper on selection in panel data, and also briefly in Chapter 19 of my MIT Press book. Strange how I have more details in my lecture slides, but that often happens.

    You can drop the lag or the lead to get a one degree-of-freedom test. By using FE, you allow selection to be correlated with unobserved heterogeneity in an arbitrary way.

    You can still do the test if you don't create the "balanced" panel but it is a bit tricky. I'm sure I have a Stata do file somewhere showing that way, too.

    Comment


    • #3
      Thank you very much Professor Wooldridge!

      I have read your 1995 paper, which was very informative.

      Comment


      • #4
        Maxence Morlet
        Hi Maxence, I am exactly in the same situation as you, I have data during the time of Covid with attrition similar to yours.
        I wonder if you implemented Jeff's suggestation. If yes, can you please share the code you used to particularly create balanced panel with missing values for the waves where they do not participate.

        Also can you please share how you commented the results on this test? Does this allow us to know the characteristics of those who attrite or it can also say something about whether attrition is random or not at random, because usually we say we cannot know if missingness is at random or not at random.

        Thank you!
        Best

        Comment

        Working...
        X