Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DIfference-in-Difference Estimation using xtdidregress command for panel data

    Hi,
    I have a data-set of 11 years for 2 areas one is treatment area and another is control area. My policy changed occurs in 2016. I want to find out the effect of change on the treatment area by using DID. I want to apply DID by using -xtdidregress-.
    It will be helpful if any one guide me to use the -xtdidregress- for my data-set.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 area int year float(loggrdp logvkt logpopden cng petrol tti) byte(treat_dummy_var post_time_dummy)
    "CONTROL"   2009 4.08982 3.81882 3.54331 .201 1.17 1.85141 0 0
    "CONTROL"   2010 4.14094 3.82747 3.55197 .201 1.09 1.97962 0 0
    "CONTROL"   2011 4.18856 3.83617 3.56066 .201 1.09 2.10783 0 0
    "CONTROL"   2012 4.20421 3.84479 3.56928 .201 1.15 2.23604 0 0
    "CONTROL"   2013 4.25524 3.85344 3.57793 .201 1.15 2.36425 0 0
    "CONTROL"   2014 4.31695 3.86212 3.58661 .201  1.3 2.49246 0 0
    "CONTROL"   2015 4.36941 3.87073 3.59522  .42  1.3 2.62067 0 0
    "CONTROL"   2016  4.4244 3.87935 3.60385  .42 1.12 2.77925 0 1
    "CONTROL"   2017 4.47662 3.88809 3.61258  .48 1.12 2.93784 0 1
    "CONTROL"   2018   4.517 3.89674 3.62123  .48 1.12 3.09642 0 1
    "CONTROL"   2019 4.56001 3.90558 3.63007 .516 1.12   3.255 0 1
    "TREATMENT" 2009  4.6127 4.74174  3.9492 .201 1.17   2.272 1 0
    "TREATMENT" 2010 4.66381 4.75719 3.96466 .201 1.09   2.331 1 0
    "TREATMENT" 2011 4.71144 4.77263 3.98009 .201 1.09    2.39 1 0
    "TREATMENT" 2012 4.72709 4.78808 3.99552 .201 1.15 2.48918 1 0
    "TREATMENT" 2013 4.77812 4.80352 4.01098 .201 1.15 2.58836 1 0
    "TREATMENT" 2014 4.83983 4.81897 4.02641 .201  1.3 2.68755 1 0
    "TREATMENT" 2015 4.89229 4.83441 4.04186  .42  1.3 2.78673 1 0
    "TREATMENT" 2016 4.94728 4.84986 4.05731  .42 1.12    2.94 1 1
    "TREATMENT" 2017  4.9995  4.8653 4.07275  .48 1.12   3.286 1 1
    "TREATMNET" 2018 5.03987 4.88075 4.08819  .48 1.12    3.63 1 1
    "TREATMENT" 2019 5.08289 4.89612 4.10358 .516 1.12 3.97927 1 1
    end

  • #2
    I want to add that my dependent variable is tti and independent variables are loggrdp logpopden logvkt petrol cng.
    Thank you

    Comment


    • #3
      Hi Sakib,

      Below is the code I would use. Note that there might be a typo on the second to last observation instead of TREATMENT there is a TREATMNET. I modified this assuming it was a typo, but you know your data best. Also, I do not know if this is just a subset of your data that you provide to obtain some guidance about the syntax or if it is your full dataset. I ask because with 2 panels and 22 observations I would be skeptical of the results.

      Code:
      encode area, generate(narea)
      xtset narea year
      generate did = treat_dummy_var*post_time_dummy
      xtdidregress (tti loggrdp logpopden logvkt petrol cng)(did), group(narea) time(year)

      Comment


      • #4
        Thanks for your reply. Actually you modified that right. Also this is my full data-set. Would you let me know how can i use Time fixed effect and robust standard errors clustered at the areas to deal with potential issues of heteroscedasticity ?
        while i am using xtreg command for this model the command was :
        Code:
        xtset treat_dummy_var year
        xtreg tti i.treat_dummy_var##i.post_time_dummy loggrdp logpopden logvkt petrol cng , cluster(treat_dummy_var)robust
        I think the result seems different from my previous analysis which i did with -xtreg- command from -xtdidregress- command.
        Last edited by Sakib Nazmus; 13 Feb 2022, 09:47.

        Comment


        • #5
          Hi Sakib,

          -xtdidregress- automatically adds time fixed effects and gives you cluster-robust standard errors at the group level. The equivalent -xtreg- command would be:

          Code:
          xtreg tti loggrdp logpopden logvkt petrol cng i.year did, fe vce(cluster narea)
          Last edited by Enrique Pinzon (StataCorp); 13 Feb 2022, 10:02.

          Comment


          • #6
            The command i have written on #4 , is that wrong or something? if wrong then please enlighten me in this matter. Also is that dataset will be a problem for DID estimation as it is small?
            Thank you.

            Comment


            • #7
              Hi Sakib,

              I think more than the specification, the issue is that you have 22 observations and 2 panels. For the within estimator that is used by -xtreg- and -xtdidregress- to work as expected you need to have a large number of panels. What we mean by a large number of panels is an asymptotic statement, but I think in your case it is not met. I would not suggest you use DID estimation with these number of observations.

              Comment


              • #8
                Hello,
                Would you suggest me an analysis by which i can find out the effect of the policy in the treatment area? at least how much data should require for DID estimation?
                Thank you.

                Comment


                • #9
                  For DD to make sense you need (usually) many treated units. If you have only two groups, what you're interested in is decidedly not DD, but interrupted time series/segmented regression that Ariel Linden's ITSA command handles. Either way, you want many more than two panels.

                  I'm most curious: What question are you studying anyways?

                  Comment


                  • #10
                    Hello,
                    I am trying to find out the impact of ride-sharing service in our city. This is a new service that launched back in 2016. I am evaluating the impact of the service on traffic congestion. Where i am trying to make a model which includes independent variables which generally influence congestion and then added this dummy to check is it increasing or decreasing the congestion. Also i am using some dependent variable which measures congestion intensity.
                    Thank you.

                    Comment


                    • #11
                      Hello,
                      I want a clear suggestion and advice regarding my DD analysis. My data set is given bellow:
                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input str3 area int year float(loggrdp logvkt logpopden cng petrol tti co2capita co2ac logdt logcost logco2 logcostac) byte(areaid treat post) float did
                      "DHK" 2009  4.6127 4.74174  3.9492 .201 1.17    2.272 .082753 .197501 2.46652 2.78779 3.07056 2.01281 1 1 0 0
                      "DHK" 2010 4.66381 4.75719 3.96466 .201 1.09    2.331 .091581 .218571 2.52446 2.84573 3.13002 2.05529 1 1 0 0
                      "DHK" 2011 4.71144 4.77263 3.98009 .201 1.09     2.39 .101267 .241686 2.58201 2.90328 3.18913  2.0974 1 1 0 0
                      "DHK" 2012 4.72709 4.78808 3.99552 .201 1.15  2.48918 .113573 .271058 2.64576 2.96703 3.25438 2.14571 1 1 0 0
                      "DHK" 2013 4.77812 4.80352 4.01098 .201 1.15  2.58836 .127138 .303432 2.70863  3.0299 3.31883 2.19313 1 1 0 0
                      "DHK" 2014 4.83983 4.81897 4.02641 .201  1.3  2.68755 .142177 .339324 2.77097 3.09224 3.38283 2.24002 1 1 0 0
                      "DHK" 2015 4.89229 4.83441 4.04186  .42  1.3  2.78673 .158951 .379358 2.83308 3.15435 3.44671  2.2867 1 1 0 0
                      "DHK" 2016 4.94728 4.84986 4.05731  .42 1.12 3.084865 .189826 .453046 2.93002 3.25129 3.53924 2.36819 1 1 1 1
                      "DHK" 2017  4.9995  4.8653 4.07275  .48 1.12    3.383 .229172  .54695 3.03242  3.3537 3.63649 2.45515 1 1 1 1
                      "DHK" 2018 5.03987 4.88075 4.08819  .48 1.12 3.681135   .2818 .672554 3.14392 3.46519 3.74172  2.5512 1 1 1 1
                      "DHK" 2019 5.08289 4.89612 4.10358 .516 1.12  3.97927  .35639 .850572 3.26923  3.5905 3.85907 2.66114 1 1 1 1
                      "CTG" 2009 4.08982 3.81882 3.54331 .201 1.17  1.85141 .113321  .15566 1.99974 2.31025 2.65908 1.84336 2 0 0 0
                      "CTG" 2010 4.14094 3.82747 3.55197 .201 1.09  1.97962 .134985 .185419 2.08437 2.39488  2.7437 1.91933 2 0 0 0
                      "CTG" 2011 4.18856 3.83617 3.56066 .201 1.09  2.10783 .156944 .215583 2.15852 2.46904 2.81786 1.98479 2 0 0 0
                      "CTG" 2012 4.20421 3.84479 3.56928 .201 1.15  2.23604 .179331 .246334 2.22506 2.53557 2.88439  2.0427 2 0 0 0
                      "CTG" 2013 4.25524 3.85344 3.57793 .201 1.15  2.36425 .202279 .277856 2.28601 2.59652 2.94534   2.095 2 0 0 0
                      "CTG" 2014 4.31695 3.86212 3.58661 .201  1.3  2.49246 .225925 .310336  2.3427 2.65321 3.00203 2.14301 2 0 0 0
                      "CTG" 2015 4.36941 3.87073 3.59522  .42  1.3  2.62067 .250414 .343975   2.396 2.70651 3.05533 2.18771 2 0 0 0
                      "CTG" 2016  4.4244 3.87935 3.60385  .42 1.12  2.77925 .275274 .378123  2.4442 2.75472 3.10506 2.22728 2 0 1 0
                      "CTG" 2017 4.47662 3.88809 3.61258  .48 1.12  2.93784 .300237 .412414 2.48916 2.79967  3.1515 2.26351 2 0 1 0
                      "CTG" 2018   4.517 3.89674 3.62123  .48 1.12  3.09642 .325438  .44703 2.53137 2.84188 3.19515 2.29706 2 0 1 0
                      "CTG" 2019 4.56001 3.90558 3.63007 .516 1.12    3.255 .351007 .482152 2.57161 2.88212 3.23684 2.32847 2 0 1 0
                      end
                      I am following a published article which is https://papers.ssrn.com/sol3/papers....act_id=2843301 to be exact described analysis in page 8.For that analysis with my dataset i am using the bellow command in stata:

                      Code:
                      xtset areaid year
                      
                      xtreg Dependent_var i.treat##i.post loggrdp logpopden logvkt petrol cng , cluster (areaid) robust
                      As i am not very experienced with DD so i want to know whether i am in the right direction or not in term of my analysis. Expert guidance will be highly appreciated.
                      ThankYou

                      Comment


                      • #12
                        Yeah this looks okay. I mean, if I were you, I would use synthetic controls, since this is superior to DD (generally speaking), but so long as you're sure your untreated units are good comparisons to the treated unit, go ahead and use this model.

                        EDIT: I didn't see that you only had two panels. This makes synthetic controls impossible, so you must either use interrupted time series or a simple 2 by 2 DD.

                        No need to use xtreg, just use

                        Code:
                         
                         xtdidregress (tti loggrdp logpopden logvkt petrol cng)(did), group(narea) time(year)
                        My honest advice to you is to get more panels, get more data on untreated units. What you can do is limited by your current data structure.
                        Last edited by Jared Greathouse; 22 Feb 2022, 12:18.

                        Comment


                        • #13
                          Hello,

                          First of all thank you for your valuable advice. In this case if i use xtreg as described #11 it gives me significant results and the relation between all dependent and independent variable can be explain but if i use xtdidregress as #12 then it gives me insignificant results for the same data set also the relation between variable seems abnormal. In this case should i use #11?
                          Also, should i collect data of another untreated area which will makes 3 panel in total or collect data of existing panel with more time period? Also if i need to use simple 2 by 2 DD then what will be the command as collecting more data will be difficult i guess?

                          Thankyou
                          Last edited by Sakib Nazmus; 22 Feb 2022, 23:49.

                          Comment


                          • #14
                            Why does it matter if the results are significant or not? My advice to you is to collect data on as many untreated units as you possibly can. If you can get data on 20 other untreated units, use those.

                            Comment

                            Working...
                            X