Removing collinearity between dummy variables in difference-in-difference regression

Stephen Butler

Join Date: Aug 2019

Posts: 24
#1

Removing collinearity between dummy variables in difference-in-difference regression

05 Sep 2019, 07:12

Hello all
I'm doing a difference-in-difference regression to analyse the impact of removing financial incentives from 26 quality indicators, across 450 health clinics.

I have a dummy variable for whether the indicator is in the treatment or control group. (26 in treatment group, 2 in control group)

I also have a dummy (categorical) variable for practice and quality indicator.

In the regression I'd like to absorb the time-invariant fixed effects of each indicator and practice:

Code:

reg Performance Year##Treated i.Indicator i.Practice

The problem is that 'Indicator' is perfectly collinear with the 'Treated' dummy.

What's the best strategy to deal with this is Stata? Many thanks. (I'm using Stata 15).
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

05 Sep 2019, 07:18

I gather the best strategy should be removing one of the predictors.

Best regards,

Marcos
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

06 Sep 2019, 10:48

You'll increase your chances of a useful answer by following the FAQ on asking questions - in addition to code (better in code delimiters), provide sample data and output.

If you have multiple years and observations before and after treatment, then there is a data construction problem if treat is colinear with a practice indicator. You may be using treat for all observations for the practice in which case you obviously can't differentiate between treat and practice.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#4

06 Sep 2019, 10:55

The problem is that 'Indicator' is perfectly collinear with the 'Treated' dummy.

No, the problem is that you think this is a problem. There is no problem.

Of course, the way the variables are constructed there is colinearity between Treated and the Indicator variables. So, to identify the model, Stata will either omit Treated (but, crucially, not the Year#Treated interaction) or one of the Indicator variables (beyond the one that is automatically omitted as a reference category) from the regression. But this harms nothing. Your Year#Treated coefficient will still be the DID estimator of the causal effect of being treated, and all model predictions will be made consistently regardless of which way Stata eliminates this colinearity.

You are worrying about a non-issue. Your model is fine. Run it and move on.
Comment
Stephen Butler

Join Date: Aug 2019

Posts: 24
#5

06 Sep 2019, 12:41

Thanks all for your advice.

Clyde - to put it a different way, my actual problem (I think) with the collinearity between Indicator and Treat is that it means I can't use 'pwcompare' afterwards.

As we've discussed before, I have three time points (2016, 2017, 2018) - if I regress Performance over the interaction of Treat and Year (adjusting for Indicator effects), that only gives me the difference between 2018 and 2016, and 2017 and 2016 (not 2018 and 2017, which I also want). You helpfully suggested 'pwcompare' for getting that info. But I can't do that if there's collinearity.

My alternative approach is two use two datasets (one with 2016 and 2017, and one with 2017 and 2018) - and do the regression separately for both. But then I'm re-calculating the fixed effects (and getting different values) for each year, so I don't think (?) the two Year#Treated interactions are entirely comparable.

Or am I wrong about that? But you are right I'm probably quibbling over a relatively small issue!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30191

06 Sep 2019, 18:10

As we've discussed before, I have three time points (2016, 2017, 2018) - if I regress Performance over the interaction of Treat and Year (adjusting for Indicator effects), that only gives me the difference between 2018 and 2016, and 2017 and 2016 (not 2018 and 2017, which I also want). You helpfully suggested 'pwcompare' for getting that info. But I can't do that if there's collinearity.

That's simply not true. If you are unable to use -pwcompare- here then something else is going on. Here's an example with a toy data set.

Code:

. //  CREATE A DEMONSTRATION DATA SET
. 
. clear*

. set obs 26
number of observations (_N) was 0, now 26

. gen indicator = _n

. expand 3
(52 observations created)

. by indicator, sort: gen year = 2015+_n

. gen treated = (indicator <= 24)

. 
. set seed 1234

. gen performance = runiform()

. 
. regress performance i.year##i.treated i.indicator
note: 26.indicator omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =        78
-------------+----------------------------------   F(29, 48)       =      0.88
       Model |  2.89148185        29  .099706271   Prob > F        =    0.6345
    Residual |  5.42331652        48  .112985761   R-squared       =    0.3478
-------------+----------------------------------   Adj R-squared   =   -0.0463
       Total |  8.31479837        77  .107984394   Root MSE        =    .33613

------------------------------------------------------------------------------
 performance |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
       2017  |  -.1083207   .3361335    -0.32   0.749    -.7841625     .567521
       2018  |   .5168094   .3361335     1.54   0.131    -.1590323    1.192651
             |
   1.treated |   .5301863   .3407701     1.56   0.126    -.1549779     1.21535
             |
year#treated |
     2017 1  |   .0037008   .3498589     0.01   0.992    -.6997376    .7071392
     2018 1  |  -.5757054   .3498589    -1.65   0.106    -1.279144     .127733
             |
   indicator |
          2  |   .0354521   .2744519     0.13   0.898    -.5163704    .5872746
          3  |  -.0192441   .2744519    -0.07   0.944    -.5710666    .5325785
          4  |  -.0541893   .2744519    -0.20   0.844    -.6060118    .4976333
          5  |  -.1303415   .2744519    -0.47   0.637     -.682164     .421481
          6  |  -.3673918   .2744519    -1.34   0.187    -.9192143    .1844307
          7  |  -.3002853   .2744519    -1.09   0.279    -.8521078    .2515373
          8  |   .0697431   .2744519     0.25   0.800    -.4820794    .6215656
          9  |  -.2013266   .2744519    -0.73   0.467    -.7531491     .350496
         10  |  -.3125422   .2744519    -1.14   0.260    -.8643647    .2392804
         11  |  -.2011923   .2744519    -0.73   0.467    -.7530148    .3506302
         12  |   -.127676   .2744519    -0.47   0.644    -.6794985    .4241465
         13  |  -.1013973   .2744519    -0.37   0.713    -.6532198    .4504252
         14  |  -.0524789   .2744519    -0.19   0.849    -.6043014    .4993436
         15  |    .065767   .2744519     0.24   0.812    -.4860555    .6175895
         16  |  -.4151613   .2744519    -1.51   0.137    -.9669838    .1366612
         17  |   .1766241   .2744519     0.64   0.523    -.3751984    .7284466
         18  |  -.1755816   .2744519    -0.64   0.525    -.7274041    .3762409
         19  |  -.4211959   .2744519    -1.53   0.131    -.9730184    .1306266
         20  |  -.2707734   .2744519    -0.99   0.329    -.8225959    .2810491
         21  |  -.0506832   .2744519    -0.18   0.854    -.6025057    .5011393
         22  |  -.2585107   .2744519    -0.94   0.351    -.8103332    .2933118
         23  |   .1432751   .2744519     0.52   0.604    -.4085474    .6950976
         24  |  -.4116369   .2744519    -1.50   0.140    -.9634594    .1401856
         25  |   .1591635   .2744519     0.58   0.565     -.392659     .710986
         26  |          0  (omitted)
             |
       _cons |   .1822435   .2744519     0.66   0.510    -.3695791     .734066
------------------------------------------------------------------------------

. 
. margins year, pwcompare

Pairwise comparisons of predictive margins      Number of obs     =         78
Model VCE    : OLS

Expression   : Linear prediction, predict()

---------------------------------------------------------------
              |            Delta-method         Unadjusted
              |   Contrast   Std. Err.     [95% Conf. Interval]
--------------+------------------------------------------------
         year |
2017 vs 2016  |  -.1049046   .0932267     -.2923494    .0825402
2018 vs 2016  |   -.014611   .0932267     -.2020558    .1728338
2018 vs 2017  |   .0902936   .0932267     -.0971511    .2777384
---------------------------------------------------------------

Edited: Changed example to one more resembling the structure of Mr. Butler's data.

Last edited by Clyde Schechter; 06 Sep 2019, 18:12.

Comment

Stephen Butler

Join Date: Aug 2019

Posts: 24
#7

07 Sep 2019, 04:41

Yes, that does work, thanks Clyde. I may have misrepresented what I was looking to do slightly.

I was also trying to use:

Code:

margins year#treat

(to get mean adjusted performance rates, for control and treated groups,)

Also, if I had several 'treat' variables (0 for control, then treatment 1, treatment 2 etc) and I wanted to compare the effect of treatment 1 to treatment 2, then I think I need:

Code:

margins year, dydx(treat) pwcompare

These commands don't work for me, if there's collinearity between my 'treat' and' indicator' variables and I've adjusted for 'indicator' effects in the regression.

The commands do work if I only adjust for 'practice' fixed-effects.

So - assuming I haven't missed something else - I suppose my question is whether I can get round that problem, without dropping the 'indicator' dummy from my regression.

Apologies for my confused explanation; thank you for your patience with a relative newcomer!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#8

07 Sep 2019, 09:22

Yes, that's different. No, you can't get around that. The quantities you are trying to estimate there are actually undefined: they depend on the specific way in which the colinearity is broken, so they are not identified by the model.
Comment
Stephen Butler

Join Date: Aug 2019

Posts: 24
#9

07 Sep 2019, 10:08

Thanks Clyde, I think I understand now. I'll make do with the slightly cruder model (it hardly impacts my DiD estimation).
Comment

Announcement