Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Differences with a continuous treatment

    Hello everyone,

    I am currently doing a research about the effects on labour market of Venezuelan migration in Peru. For the first step I want to get the effects of natives mean wages in the three biggest cities in terms of population due to the recent mass migration. In order to do this model, I got the yearly mean wages by city, from a large dataset of yearly labour market surveys (Cross-sectional data from 2014 to 2019) and the yearly migration share on cities' population which starts in 2017, this means that the treatment variable is 0 before 2017 and increases every year, since 2017, for each city with different intensity. Before the year 2017, where the mass migration started, there is a parallel trend in mean wages between this 3 cities which also share cultural demographics.

    So I tried this code:

    didregress (cities_wmean) (legshare_cities, continuous), group(cities) time(year)

    cities_wmean: It is a variable which is equal to the cities' mean wage. the value is the same for each respondent within each city, due to previous coding.
    legshare_cities: I got legal migration share which is a proxy of the real migration, this variable goes from 0 to 1 because Stata does not accept a percentage variable, I would like to know if there is a different way to create a percentage variable.


    cities: categorical variable that groups the cities' surveys respondents.

    On the first try I did not set the 0 in the legal migrant share variable for the pre treatment time, so the regression p-value indicated a statistically significant effect of the treatment coefficient, this did not happen when I set the legal migrant share for the pre treatment time. The following graphs shows us this:

    didregress (cities_wmean_n) (legshare_cities, continuous), group(cities) time(year) aeq


    Difference-in-differences regression Number of obs = 61,216
    Data type: Repeated cross-sectional

    (Std. err. adjusted for 3 clusters in what)
    ---------------------------------------------------------------------------------
    | Robust
    cities_wmean_n | Coefficient std. err. t P>|t| [95% conf. interval]
    ----------------+----------------------------------------------------------------
    ATET |
    legshare_cities | .5157201 .0411641 12.53 0.006 .3386054 .6928349
    ----------------+----------------------------------------------------------------
    Controls |
    year |
    2018 | 1.211922 .2154213 5.63 0.030 .285039 2.138805
    2019 | 1.465113 .1342818 10.91 0.008 .8873452 2.042881
    |
    _cons | 115.5126 .0882052 1309.59 0.000 115.1331 115.8921
    ---------------------------------------------------------------------------------
    Note: ATET estimate adjusted for group effects and time effects.


    Difference-in-differences regression Number of obs = 120,939
    Data type: Repeated cross-sectional

    (Std. err. adjusted for 3 clusters in what)
    ---------------------------------------------------------------------------------
    | Robust
    cities_wmean_n | Coefficient std. err. t P>|t| [95% conf. interval]
    ----------------+----------------------------------------------------------------
    ATET |
    legshare_cities | .5748748 .7738795 0.74 0.535 -2.75486 3.904609
    ----------------+----------------------------------------------------------------
    Controls |
    year |
    2015 | 1.026905 .0330452 31.08 0.001 .8847226 1.169087
    2016 | 2.118325 1.136744 1.86 0.203 -2.772689 7.00934
    2017 | 1.573374 .7771873 2.02 0.180 -1.770592 4.917341
    2018 | 2.749396 1.51991 1.81 0.212 -3.790251 9.289042
    2019 | 2.89084 2.599513 1.11 0.382 -8.29396 14.07564
    |
    _cons | 114.0902 .5663133 201.46 0.000 111.6536 116.5269
    ---------------------------------------------------------------------------------
    Note: ATET estimate adjusted for group effects and time effects.

    I would like to know if there is something wrong with my set up of this Diff. in diff. regression, what would be the meaning of the treatment coefficient if the set up is correct, and other suggestions.

    Many thanks in advance.






  • #2
    I'm willing to bet.... my next paycheck, that didreg was intended for binary interventions, and not continuous ones.

    If I keep up with the literature will, continuous treatment DD is still on the frontiers of the field, and it's not one I'm very well versed in to be completely honest with you.

    Comment


    • #3
      Originally posted by Jared Greathouse View Post
      I'm willing to bet.... my next paycheck, that didreg was intended for binary interventions, and not continuous ones.

      If I keep up with the literature will, continuous treatment DD is still on the frontiers of the field, and it's not one I'm very well versed in to be completely honest with you.
      thanks, however regarding the percentage treatment variable which defines the immigration share on cities' population, is it correct to set it from 0 to 1?, I ask this because STATA does not allow to put percentage variables.

      Comment


      • #4
        No, the point I'm making to you here is that unless there's a new DD command I'm unaware of, you can't use a treatment variable parameterized as a percentage, it needs to be binary. That is, your intervention must either be 0 or 1, treated or untreated (generally).

        Comment


        • #5
          Originally posted by Jared Greathouse View Post
          No, the point I'm making to you here is that unless there's a new DD command I'm unaware of, you can't use a treatment variable parameterized as a percentage, it needs to be binary. That is, your intervention must either be 0 or 1, treated or untreated (generally).
          I get this on Stata:

          "didregress (ovar omvarlist) (tvar[, continuous]) [if] [in] [weight], group(groupvars) [time(timevar) options]

          tvar must be a binary variable indicating observations subject to treatment or a continuous variable measuring treatment intensity."

          that is why I would like to know how to parameterize my percentage treatment variable.

          Sorry if I got it wrong, my English is not that good.







          Comment


          • #6
            omardanles omar morales Okay I shouldn't have bet my paycheck then, apparently it does work with continuous variables, Then yes, you can indeed use a percentage variable perfectly fine as your intervention.


            You'd just make a variable that divides the immigrant population by the total population

            Code:
            g treat = immpop/totpop

            Comment


            • #7
              Perhaps check out the commands "fuzzydid" if ever the implementation of your treatment is fuzzy, and "did_multiplegt". They have been elaborated by De Chaisemartin and D'Haultfoeuille.

              They are available from the ssc repository

              Comment

              Working...
              X