Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference Indifference using Cross sectional data

    Respected Sir ,
    I want to do Difference Indifference using repeated Cross sectional data. I have used two rounds of NFHS data set . And have append them .Now with this repeated cross-sectional data I want to study the implication of a particular policy on BPL population. During the first round of NFHS data the policy was not implemented. Whereas during the second round the policy was implemented. So I have a treatment group(the BPL population who are using the policy benefit) and also control variables like age,education ,gender etc.I have considered year dummy (0,1 for first and second round of data ). What are the stata commands to do difference indifference equation. Do I need to do fixed effect? If yes how to do it?I want to do logit regression as my dependent variable is categorical variable.Please help.
    Last edited by Moupiyali Koley; 19 Feb 2023, 03:29.

  • #2
    Moupiyali:
    welcome to this forum.
    See -help DID intro-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank You so much Sir for replying. I have gone through the DID help in Stata. But I am confused about the steps that I should follow to do a logit regression ,as my data set is repeated cross section data.

      Comment


      • #4
        Can we run "didregress "command in stata 14.2?Please help
        Last edited by Moupiyali Koley; 05 Mar 2023, 23:53.

        Comment


        • #5
          Moupiyali:
          no, but see https://www.princeton.edu/~otorres/DID101.pdf
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you so much Sir

            Comment


            • #7
              Sir I need one more help.Sir I am working with repeated cross section data. My outcome variable is categorical. So I planned to do a logit model .I want to study the pre and post impact of a policy .I have two round of data set.One of 2016 and the other 2019.The policy is implemented after 2016.so I want to do difference in difference.But the problem is how to specify the control and treatment group, since my outcome variable is categorical.Please help Sir.

              Comment


              • #8
                I was looking into this yesterday in this thread when I came across this other stats stack exchange thread:

                https://stats.stackexchange.com/ques...ic-regressions

                This makes me think it might be fairly difficult to do a difference in difference analysis with a logit outcome. Can anyone here confirm this?

                Comment


                • #9
                  Yes Sir.I am facing challenges to specify the control and treatment group. Can you please suggest any other way .

                  Comment


                  • #10
                    Daniel Schaefer I can't mathematically prove it, but I don't think it's a good idea. Even before we get to estimation, however would we demonstrate the trends of the treatment group are parallel to a donor pool? Maybe others will correct me, but I don't think it's possible, or at the very least I've never heard of it.

                    Comment


                    • #11
                      Jared Greathouse Then sir how to do difference in difference with a categorical outcome variable.Please help.

                      Comment


                      • #12
                        I just said you can't do that🤣 or that you shouldn't do it

                        Comment


                        • #13
                          While there are serious issues to consider about interpreting a DID analysis with non-linear regressions due to the question of what parallel trends even means in this context, I note that in this instance, O.P. only has one pre-intervention and one post-intervention time period, so that parallel trends would be a matter of faith, or, less likely available, prior knowledge and theory, rather than empiricism even if a linear regression were used.

                          Most likely, if I were in this situation, assuming that the outcome probabilities are generally not in the extreme ranges, I would use a linear probability model with robust standard errors to deal with heteroscedasticity (assuming the sample size is large enough to support robust standard errors).

                          There are a few other questions O.P. has asked along the way. Fixed-effects regression would not be appropriate here because there really aren't any repeating observations: you have cross-sectional data, and few if any units observed in 2016 were also observed in 2019. So this is not longitudinal data. Just a flat one-level regression along the lines of
                          -regress outcome i.group##i.year- (or, if sticking with logistic regression, -logit outcome i.group##i.year-) would be best.

                          All of that said, it isn't clear from the description in #1 how one could define the intervention and control groups in this data. The situation describes data gathered in a population sample from before the policy was implemented, and then again after. O.P. wants to define the treatment group post-policy as those who actually "use" the policy. But there is no clear way to define the analogous group in the pre-policy era. If this were longitudinal data on the same people, that would make life simple. But there is no clear way to divide up the pre-policy sample, who are different people from the post-policy sample, into those who "would have" and "would not have" used the policy had it been available then. So I think that all of the technical issues that have arisen in this thread so far, including my own musings before this paragraph, are, at best, premature. It doesn't seem to me like there is data that would support a difference-in-differences analysis even if all of the technical problems were easily overcome. It seems to me the best we can squeeze from this data is a comparison of outcomes in users and non-users in the post-policy data. This is a non-randomized comparison, which is inherently weak. It might be improved somewhat by incorporating propensity scores or adjusting for numerous confounders.

                          Note to O.P.: In #1 you describe your situation including the abbreviations NHFS and BPL. I, for one, have no idea what these are. I suspect I am hardly alone in that regard. Before your next post here, please review the Forum FAQ and take to heart the excellent advice you will find there on how to post in the most effective ways, maximizing your chance of getting timely and helpful responses. Among the things you will learn there: always spell out abbreviations on first use unless they are the kind of abbreviations that every body around the world, no matter what their area of work, would recognize immediately. While I think it is unlikely that knowledge of what NHFS and BPL are would materially matter for responding to your specific questions, perhaps there is something special about those that might lead to different conclusions. In any event, please don't use unexplained abbreviations.



                          Comment


                          • #14
                            Dear Moupiyali,

                            As always, the forum provides great advice.

                            Jeff Wooldridge has written on this recently and presented at the Stata Econ Virtual Symposium https://www.stata.com/symposiums/economics21/ . Below is a link to a working paper on the subject:

                            https://papers.ssrn.com/sol3/papers....act_id=4183726

                            Comment


                            • #15
                              It doesn't seem to me like there is data that would support a difference-in-differences analysis even if all of the technical problems were easily overcome.
                              I forgot to say this, but yeah that would be my other (my main, actually) point. Cross section data=no DD. Not possible, even in my wildest dreams.

                              Barring that, if Dr. Jeff Wooldridge has a solution for the non linear angle of things, then I'm not about to argue with him. However, the pre-post idea still needs to be there. Even if you had one pre-period, I wouldn't like it/be the biggest fan, but I'd still say "go for it".

                              Comment

                              Working...
                              X