Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing collinearity between dummy variables in difference-in-difference regression

    Hello all
    I'm doing a difference-in-difference regression to analyse the impact of removing financial incentives from 26 quality indicators, across 450 health clinics.

    I have a dummy variable for whether the indicator is in the treatment or control group. (26 in treatment group, 2 in control group)

    I also have a dummy (categorical) variable for practice and quality indicator.

    In the regression I'd like to absorb the time-invariant fixed effects of each indicator and practice:

    Code:
    reg Performance Year##Treated i.Indicator i.Practice
    The problem is that 'Indicator' is perfectly collinear with the 'Treated' dummy.

    What's the best strategy to deal with this is Stata? Many thanks. (I'm using Stata 15).

  • #2
    I gather the best strategy should be removing one of the predictors.
    Best regards,

    Marcos

    Comment


    • #3
      You'll increase your chances of a useful answer by following the FAQ on asking questions - in addition to code (better in code delimiters), provide sample data and output.

      If you have multiple years and observations before and after treatment, then there is a data construction problem if treat is colinear with a practice indicator. You may be using treat for all observations for the practice in which case you obviously can't differentiate between treat and practice.

      Comment


      • #4
        The problem is that 'Indicator' is perfectly collinear with the 'Treated' dummy.
        No, the problem is that you think this is a problem. There is no problem.

        Of course, the way the variables are constructed there is colinearity between Treated and the Indicator variables. So, to identify the model, Stata will either omit Treated (but, crucially, not the Year#Treated interaction) or one of the Indicator variables (beyond the one that is automatically omitted as a reference category) from the regression. But this harms nothing. Your Year#Treated coefficient will still be the DID estimator of the causal effect of being treated, and all model predictions will be made consistently regardless of which way Stata eliminates this colinearity.

        You are worrying about a non-issue. Your model is fine. Run it and move on.

        Comment


        • #5
          Thanks all for your advice.

          Clyde - to put it a different way, my actual problem (I think) with the collinearity between Indicator and Treat is that it means I can't use 'pwcompare' afterwards.

          As we've discussed before, I have three time points (2016, 2017, 2018) - if I regress Performance over the interaction of Treat and Year (adjusting for Indicator effects), that only gives me the difference between 2018 and 2016, and 2017 and 2016 (not 2018 and 2017, which I also want). You helpfully suggested 'pwcompare' for getting that info. But I can't do that if there's collinearity.

          My alternative approach is two use two datasets (one with 2016 and 2017, and one with 2017 and 2018) - and do the regression separately for both. But then I'm re-calculating the fixed effects (and getting different values) for each year, so I don't think (?) the two Year#Treated interactions are entirely comparable.

          Or am I wrong about that? But you are right I'm probably quibbling over a relatively small issue!

          Comment


          • #6
            As we've discussed before, I have three time points (2016, 2017, 2018) - if I regress Performance over the interaction of Treat and Year (adjusting for Indicator effects), that only gives me the difference between 2018 and 2016, and 2017 and 2016 (not 2018 and 2017, which I also want). You helpfully suggested 'pwcompare' for getting that info. But I can't do that if there's collinearity.
            That's simply not true. If you are unable to use -pwcompare- here then something else is going on. Here's an example with a toy data set.

            Code:
            . //  CREATE A DEMONSTRATION DATA SET
            . 
            . clear*
            
            . set obs 26
            number of observations (_N) was 0, now 26
            
            . gen indicator = _n
            
            . expand 3
            (52 observations created)
            
            . by indicator, sort: gen year = 2015+_n
            
            . gen treated = (indicator <= 24)
            
            . 
            . set seed 1234
            
            . gen performance = runiform()
            
            . 
            . regress performance i.year##i.treated i.indicator
            note: 26.indicator omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =        78
            -------------+----------------------------------   F(29, 48)       =      0.88
                   Model |  2.89148185        29  .099706271   Prob > F        =    0.6345
                Residual |  5.42331652        48  .112985761   R-squared       =    0.3478
            -------------+----------------------------------   Adj R-squared   =   -0.0463
                   Total |  8.31479837        77  .107984394   Root MSE        =    .33613
            
            ------------------------------------------------------------------------------
             performance |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    year |
                   2017  |  -.1083207   .3361335    -0.32   0.749    -.7841625     .567521
                   2018  |   .5168094   .3361335     1.54   0.131    -.1590323    1.192651
                         |
               1.treated |   .5301863   .3407701     1.56   0.126    -.1549779     1.21535
                         |
            year#treated |
                 2017 1  |   .0037008   .3498589     0.01   0.992    -.6997376    .7071392
                 2018 1  |  -.5757054   .3498589    -1.65   0.106    -1.279144     .127733
                         |
               indicator |
                      2  |   .0354521   .2744519     0.13   0.898    -.5163704    .5872746
                      3  |  -.0192441   .2744519    -0.07   0.944    -.5710666    .5325785
                      4  |  -.0541893   .2744519    -0.20   0.844    -.6060118    .4976333
                      5  |  -.1303415   .2744519    -0.47   0.637     -.682164     .421481
                      6  |  -.3673918   .2744519    -1.34   0.187    -.9192143    .1844307
                      7  |  -.3002853   .2744519    -1.09   0.279    -.8521078    .2515373
                      8  |   .0697431   .2744519     0.25   0.800    -.4820794    .6215656
                      9  |  -.2013266   .2744519    -0.73   0.467    -.7531491     .350496
                     10  |  -.3125422   .2744519    -1.14   0.260    -.8643647    .2392804
                     11  |  -.2011923   .2744519    -0.73   0.467    -.7530148    .3506302
                     12  |   -.127676   .2744519    -0.47   0.644    -.6794985    .4241465
                     13  |  -.1013973   .2744519    -0.37   0.713    -.6532198    .4504252
                     14  |  -.0524789   .2744519    -0.19   0.849    -.6043014    .4993436
                     15  |    .065767   .2744519     0.24   0.812    -.4860555    .6175895
                     16  |  -.4151613   .2744519    -1.51   0.137    -.9669838    .1366612
                     17  |   .1766241   .2744519     0.64   0.523    -.3751984    .7284466
                     18  |  -.1755816   .2744519    -0.64   0.525    -.7274041    .3762409
                     19  |  -.4211959   .2744519    -1.53   0.131    -.9730184    .1306266
                     20  |  -.2707734   .2744519    -0.99   0.329    -.8225959    .2810491
                     21  |  -.0506832   .2744519    -0.18   0.854    -.6025057    .5011393
                     22  |  -.2585107   .2744519    -0.94   0.351    -.8103332    .2933118
                     23  |   .1432751   .2744519     0.52   0.604    -.4085474    .6950976
                     24  |  -.4116369   .2744519    -1.50   0.140    -.9634594    .1401856
                     25  |   .1591635   .2744519     0.58   0.565     -.392659     .710986
                     26  |          0  (omitted)
                         |
                   _cons |   .1822435   .2744519     0.66   0.510    -.3695791     .734066
            ------------------------------------------------------------------------------
            
            . 
            . margins year, pwcompare
            
            Pairwise comparisons of predictive margins      Number of obs     =         78
            Model VCE    : OLS
            
            Expression   : Linear prediction, predict()
            
            ---------------------------------------------------------------
                          |            Delta-method         Unadjusted
                          |   Contrast   Std. Err.     [95% Conf. Interval]
            --------------+------------------------------------------------
                     year |
            2017 vs 2016  |  -.1049046   .0932267     -.2923494    .0825402
            2018 vs 2016  |   -.014611   .0932267     -.2020558    .1728338
            2018 vs 2017  |   .0902936   .0932267     -.0971511    .2777384
            ---------------------------------------------------------------
            Edited: Changed example to one more resembling the structure of Mr. Butler's data.
            Last edited by Clyde Schechter; 06 Sep 2019, 18:12.

            Comment


            • #7
              Yes, that does work, thanks Clyde. I may have misrepresented what I was looking to do slightly.

              I was also trying to use:
              Code:
              margins year#treat
              (to get mean adjusted performance rates, for control and treated groups,)

              Also, if I had several 'treat' variables (0 for control, then treatment 1, treatment 2 etc) and I wanted to compare the effect of treatment 1 to treatment 2, then I think I need:
              Code:
              margins year, dydx(treat) pwcompare
              These commands don't work for me, if there's collinearity between my 'treat' and' indicator' variables and I've adjusted for 'indicator' effects in the regression.

              The commands do work if I only adjust for 'practice' fixed-effects.

              So - assuming I haven't missed something else - I suppose my question is whether I can get round that problem, without dropping the 'indicator' dummy from my regression.

              Apologies for my confused explanation; thank you for your patience with a relative newcomer!

              Comment


              • #8
                Yes, that's different. No, you can't get around that. The quantities you are trying to estimate there are actually undefined: they depend on the specific way in which the colinearity is broken, so they are not identified by the model.

                Comment


                • #9
                  Thanks Clyde, I think I understand now. I'll make do with the slightly cruder model (it hardly impacts my DiD estimation).

                  Comment

                  Working...
                  X