Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference

    Hi Statalist friends

    I want to do a diff-in-diff Regression, but I get always the following error: Model is not identified. The treatment variable treated was omitted because of collinearity. How can I fix that?

    Code:
     didregress (Arbeitslosenrate) (treated), group(code) time(numvar)
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float code int numvar float(Anteil_Betroffene Arbeitslosenrate) str2 anos2 str3 estu str2(nuts1 sexo) float(Anzahl_Betroffene treated post)
    12 226 23.6 39.9 "05" "2" "4" "6" 89 1 0
    12 230 23.6 42.2 "05" "2" "4" "6" 89 1 0
    12 234 23.6 39.9 "05" "2" "4" "6" 89 1 0
    12 238 23.6   42 "05" "2" "4" "6" 89 1 1
    12 240 23.6 44.4 "05" "2" "4" "6" 89 1 1
    49 226   .2 12.7 "05" "5" "2" "1"  1 0 0
    49 230   .2  9.2 "05" "5" "2" "1"  1 0 0
    49 234   .2    7 "05" "5" "2" "1"  1 0 0
    49 238   .2  6.4 "05" "5" "2" "1"  1 0 1
    49 240   .2  6.5 "05" "5" "2" "1"  1 0 1
    end


    I would be very thankful if someone could help me.

    Last edited by Felix Chappuis; 27 May 2021, 10:14.

  • #2
    Next time, please share your data using dataex and not as an attachment (see FAQ for detail). I don't have Stata 17 so I can't use didregress but you're probably looking to do something along the lines of

    Code:
    reg Arbeitslosenrate i.treated##i.post
    
          Source |       SS           df       MS      Number of obs   =        10
    -------------+----------------------------------   F(3, 6)         =    243.73
           Model |  2795.41785         3   931.80595   Prob > F        =    0.0000
        Residual |  22.9383336         6  3.82305561   R-squared       =    0.9919
    -------------+----------------------------------   Adj R-squared   =    0.9878
           Total |  2818.35618         9  313.150687   Root MSE        =    1.9553
    
    ------------------------------------------------------------------------------
    Arbeitslos~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       1.treated |   31.03333   1.596466    19.44   0.000     27.12692    34.93975
          1.post |  -3.183333   1.784903    -1.78   0.125    -7.550834    1.184168
                 |
    treated#post |
            1 1  |   5.716666   2.524234     2.26   0.064    -.4599131    11.89325
                 |
           _cons |   9.633333   1.128872     8.53   0.000     6.871083    12.39558
    ------------------------------------------------------------------------------

    Comment


    • #3
      Thank you very much. Could someone say how it works on Stata 17 with the didregress command? Would be very thankful.

      Comment


      • #4
        Dear Felix,

        The issue is that the variable -treated- indicates the treated group whereas -didregress- requires that what is in the second set of parentesis be a variable that indicates which individual observations are treated. In your example this is equivalent to -1.treated*1.post-

        Code:
        generate indicate = 1.treated*1.post
        didregress (Arbeitslosenrate) (indicate), group(code) time(numvar)
        This will get you the point estimate you want. Notice, however, that the default standard errors are cluster robust standard errors, clustered at the -code- level, and they are not well defined for the data you sent. Perhaps, you just sent a subset of your data. In any case, this is what I get:

        Code:
        . generate indicate = 1.treated*1.post
        
        . didregress (Arbeitslosenrate) (indicate), group(code) time(numvar)
        
        Number of groups and treatment time
        
        Time variable: numvar
        Control:       indicate = 0
        Treatment:     indicate = 1
        -----------------------------------
                     |   Control  Treatment
        -------------+---------------------
        Group        |
                code |         1          1
        -------------+---------------------
        Time         |
             Minimum |       226        238
             Maximum |       226        238
        -----------------------------------
        
        Difference-in-differences regression                        Number of obs = 10
        Data type: Repeated cross-sectional
        
                                           (Std. err. adjusted for 2 clusters in code)
        ------------------------------------------------------------------------------
                     |               Robust
        Arbeitslos~e | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
        ATET         |
            indicate |
           (1 vs 0)  |   5.716666          .        .       .            .           .
        ------------------------------------------------------------------------------
        Note: ATET estimate adjusted for group effects and time effects.

        Comment


        • #5
          Dear Enrique
          Thank you very much. It was only a simple example. Here is the result with the full dataset. Is it true that I can do that with the panel-data commend? In the beginning I had a for every time period a cross sectional dataset. Than I merched the dataset and collapsed it by groups of equal variable combinations. So, my datas tell me, how the individuals with some characteres developed. Is it true to treat my datas as a panel? Would be very thankful for an answer. Have a great day.

          . xtdidregress (Arbeitslosenrate) (indicate), group(code) time(numvar)

          Number of groups and treatment time

          Time variable: numvar
          Control: indicate = 0
          Treatment: indicate = 1
          -----------------------------------
          | Control Treatment
          -------------+---------------------
          Group |
          code | 79 50
          -------------+---------------------
          Time |
          Minimum | 226 234
          Maximum | 240 234
          -----------------------------------

          Difference-in-differences regression Number of obs = 512
          Data type: Longitudinal

          (Std. err. adjusted for 129 clusters in code)
          ------------------------------------------------------------------------------
          | Robust
          Arbeitslos~e | Coefficient std. err. t P>|t| [95% conf. interval]
          -------------+----------------------------------------------------------------
          ATET |
          indicate |
          (1 vs 0) | -1.116403 .791332 -1.41 0.161 -2.682189 .4493825
          ------------------------------------------------------------------------------
          Note: ATET estimate adjusted for panel effects and time effects.



          Comment


          • #6
            Hi Felix,

            You can use the new DID command for panel data sets or repeated cross-sections. From your description it is unclear to me if you have a panel data set or a repeated cross-section. In your example, it depends on the behavior of the variable code. For example, say -code- is a person and you have repeated observations of that person across time. You have a panel. Say -code- is like a country and every year you sample a different set of individuals from the country. You have a repeated cross section.

            In terms of estimation the differences between using -didregress- or -xtdidregress- are equivalent to using -regress- or -areg- vs using -xtreg ..., fe-.

            Comment


            • #7
              Hi Enrique
              Now, I understand. Thank you very much for your time.

              Comment

              Working...
              X