Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating treatment groups for difference in differences analysis using bysort

    I need to perform some difference in differences analysis on some panel data. I need to create a control group and a treatment group. The panel data consists of two waves (a 2019 and 2020). The treatment group are those whose second wave interview occurred on or after march 2nd 2020.

    I use the command:
    gen treatment=0
    replace treatment = 1 if date >= 21976

    Date is the variable showing the date of their interview and 21976 is the Stata code for march 2nd 2020. treatment is to be a dummy, 1 for the treatment group and 0 for the control group.

    This makes the observations in wave 2 equal zero for the control group and 1 for the treatment group.

    What I need to do is tell Stata that the individuals for which treatment = 1 in wave 2, make treatment equal 1 in wave 1

    I've been told that the bysort command will be useful for this task. But after searching online for hours I haven't been able to find anything to help me do this task.

    Any help is much appreciated because I'm at the end of my rope trying to solve this

  • #2
    I am going to assume that you have the following variables
    • id - an identifier for each individual
    • date - a Stata daily date of the interview; you have two interviews for each individual
    Then the following untested code may start you in a useful direction.
    Code:
    generate treatment = .
    // create the treatment indicator in the second wave for each individual
    bysort id (date): replace treatment = date>=td(2-3-2020) if _n==2  
    // copy the treatment from the second wave to the first wave for each individual
    bysort id (date): replace treatment = treatment[_2] if _n==1

    Comment


    • #3
      Thank you this seems like a good starting point. Yes I do have an id variable (pidp) that uniquely identifies each person in the dataset. I also have a wave identifier, 9 for the first wave, and 10 for the second. And also a data variable showing the date of the interview of each person in each wave.
      The first part of the code runs fine, both the generate and the first bysort. But when I run the second bysort stata returns an errors, "_2 not found" and "r(111)".

      Can you please explain what the [_2] in your code means please so I can try to adapt it so it will run correctly.

      Comment


      • #4
        I am sorry, that should not have had the underscore in front of the 2
        Code:
        bysort id (date): replace treatment = treatment[2] if _n==1
        If you are unfamiliar with Stata's subscripting notation, like the "[2]" in the corrected code, see the output of
        Code:
        help subscripting
        for an introduction.

        Comment


        • #5
          That code has solved my problem, and now I understand what it is doing. I am very grateful for your help with problem.

          Thanks again!

          Comment

          Working...
          X