Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Event Study with eventdd - Interpretation

    Hi Everyone,

    I recently posted a question here, in which I had the same goals and data I'm about to describe below. The reason why I'm creating a new post is because I'm gonna ask about another method with another code.

    Goal: I'm trying to see the effect that replications have on the citation counts of papers.

    For that, my thesis supervisor suggested me to do a "simple" DID. I implemented a generalized DID suggested in the previous question. However, my supervisor is now suggesting me to implement an Event Study in which I can plot a graph.

    My data is an unbalanced panel that looks like something like this:

    . dataex paper_id published citations year rep_year replicated length n_authors m_authors h_index in 1111/1200

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str19 paper_id int(published citations year rep_year) float replicated int length byte(n_authors m_authors) int h_index
    "Cassar_2013"   2013 11 2016 2023 1 34 3 1  92
    "Cassar_2013"   2013 11 2017 2023 1 34 3 1  92
    "Cassar_2013"   2013 22 2018 2023 1 34 3 1  92
    "Cassar_2013"   2013 12 2019 2023 1 34 3 1  92
    "Cassar_2013"   2013 18 2020 2023 1 34 3 1  92
    "Cassar_2013"   2013 16 2021 2023 1 34 3 1  92
    "Cassar_2013"   2013 16 2022 2023 1 34 3 1  92
    "Cassar_2013"   2013 30 2023 2023 1 34 3 1  92
    "Catao_2005"    2005  0 2005    . 0 17 2 2 399
    "Catao_2005"    2005  1 2006    . 0 17 2 2 399
    "Catao_2005"    2005  2 2007    . 0 17 2 2 399
    "Catao_2005"    2005  0 2008    . 0 17 2 2 399
    "Catao_2005"    2005  3 2009    . 0 17 2 2 399
    "Catao_2005"    2005  6 2010    . 0 17 2 2 399
    "Catao_2005"    2005  1 2011    . 0 17 2 2 399
    "Catao_2005"    2005  3 2012    . 0 17 2 2 399
    "Catao_2005"    2005  2 2013    . 0 17 2 2 399
    "Catao_2005"    2005  0 2014    . 0 17 2 2 399
    "Catao_2005"    2005  1 2015    . 0 17 2 2 399
    "Catao_2005"    2005  7 2016    . 0 17 2 2 399
    "Catao_2005"    2005  3 2017    . 0 17 2 2 399
    "Catao_2005"    2005  4 2018    . 0 17 2 2 399
    "Catao_2005"    2005  5 2019    . 0 17 2 2 399
    "Catao_2005"    2005  2 2020    . 0 17 2 2 399
    "Catao_2005"    2005  0 2021    . 0 17 2 2 399
    "Catao_2005"    2005  2 2022    . 0 17 2 2 399
    "Catao_2005"    2005  0 2023    . 0 17 2 2 399
    "Cattaneo_2009" 2009  0 2009 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  4 2010 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  4 2011 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  4 2012 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  4 2013 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  6 2014 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  6 2015 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 10 2016 2019 1 31 5 4  83
    "Cattaneo_2009" 2009  7 2017 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 12 2018 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 10 2019 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 15 2020 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 16 2021 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 13 2022 2019 1 31 5 4  83
    "Cattaneo_2009" 2009 13 2023 2019 1 31 5 4  83
    "Cerra_2008"    2008  3 2008 2012 1 19 2 0 399
    "Cerra_2008"    2008  7 2009 2012 1 19 2 0 399
    "Cerra_2008"    2008 18 2010 2012 1 19 2 0 399
    "Cerra_2008"    2008 22 2011 2012 1 19 2 0 399
    "Cerra_2008"    2008 35 2012 2012 1 19 2 0 399
    "Cerra_2008"    2008 30 2013 2012 1 19 2 0 399
    "Cerra_2008"    2008 35 2014 2012 1 19 2 0 399
    "Cerra_2008"    2008 37 2015 2012 1 19 2 0 399
    "Cerra_2008"    2008 35 2016 2012 1 19 2 0 399
    "Cerra_2008"    2008 43 2017 2012 1 19 2 0 399
    "Cerra_2008"    2008 41 2018 2012 1 19 2 0 399
    "Cerra_2008"    2008 36 2019 2012 1 19 2 0 399
    "Cerra_2008"    2008 50 2020 2012 1 19 2 0 399
    "Cerra_2008"    2008 44 2021 2012 1 19 2 0 399
    "Cerra_2008"    2008 44 2022 2012 1 19 2 0 399
    "Cerra_2008"    2008 33 2023 2012 1 19 2 0 399
    "Cesarini_2009" 2009  9 2009    . 0 34 5 5 337
    "Cesarini_2009" 2009 12 2010    . 0 34 5 5 337
    "Cesarini_2009" 2009 17 2011    . 0 34 5 5 337
    "Cesarini_2009" 2009 16 2012    . 0 34 5 5 337
    "Cesarini_2009" 2009 23 2013    . 0 34 5 5 337
    "Cesarini_2009" 2009 24 2014    . 0 34 5 5 337
    "Cesarini_2009" 2009 25 2015    . 0 34 5 5 337
    "Cesarini_2009" 2009 12 2016    . 0 34 5 5 337
    "Cesarini_2009" 2009 29 2017    . 0 34 5 5 337
    "Cesarini_2009" 2009 22 2018    . 0 34 5 5 337
    "Cesarini_2009" 2009 13 2019    . 0 34 5 5 337
    "Cesarini_2009" 2009 19 2020    . 0 34 5 5 337
    "Cesarini_2009" 2009 14 2021    . 0 34 5 5 337
    "Cesarini_2009" 2009 19 2022    . 0 34 5 5 337
    "Cesarini_2009" 2009 10 2023    . 0 34 5 5 337
    "Chanda_2014"   2014  2 2014 2021 1 28 3 3  94
    "Chanda_2014"   2014  1 2015 2021 1 28 3 3  94
    "Chanda_2014"   2014  6 2016 2021 1 28 3 3  94
    "Chanda_2014"   2014  1 2017 2021 1 28 3 3  94
    "Chanda_2014"   2014  4 2018 2021 1 28 3 3  94
    "Chanda_2014"   2014  2 2019 2021 1 28 3 3  94
    "Chanda_2014"   2014  7 2020 2021 1 28 3 3  94
    "Chanda_2014"   2014  9 2021 2021 1 28 3 3  94
    "Chanda_2014"   2014  6 2022 2021 1 28 3 3  94
    "Chanda_2014"   2014  2 2023 2021 1 28 3 3  94
    "Chay_2005"     2005  0 2005    . 0 22 3 3 399
    "Chay_2005"     2005  2 2006    . 0 22 3 3 399
    "Chay_2005"     2005  4 2007    . 0 22 3 3 399
    "Chay_2005"     2005  7 2008    . 0 22 3 3 399
    "Chay_2005"     2005  7 2009    . 0 22 3 3 399
    "Chay_2005"     2005 11 2010    . 0 22 3 3 399
    "Chay_2005"     2005  8 2011    . 0 22 3 3 399
    end
    ------------------ copy up to and including the previous line ------------------


    Where, paper_id are the papers, replicated is the treatment, and lenght n_authors m_authors h_index are additional characteristics of each paper that don't vary over time for each paper.

    As you can see, the panel is unbalanced because some papers can have have 20 years of citations overall (they were published long time ago), and others, 4 years of citations (published recently). My treatment will be those papers that were once replicated (replicated = 1) and the control those that were never replicated (replicated=0).

    I came across the eventdd command and I implemented it as such:
    Code:
     eventdd citations replicated i.year, timevar(timeToTreat) method(,cluster(P_id)) graph_op(ytitle("Citations per Year"))  ------------------------------------------------------------------------------
                 |               Robust
       citations |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
      replicated |   23.66014   6.326619     3.74   0.000     11.19963    36.12065
                 |
            year |
           1992  |   .4658272   .7919368     0.59   0.557    -1.093921    2.025576
           1993  |   .2147516    1.30747     0.16   0.870    -2.360358    2.789862
           1994  |  -.4756162   1.311588    -0.36   0.717    -3.058837    2.107605
           1995  |   .6927654   1.600165     0.43   0.665    -2.458819     3.84435
           1996  |  -.7758596   3.215401    -0.24   0.810     -7.10871     5.55699
           1997  |  -.8470427   3.359396    -0.25   0.801    -7.463496    5.769411
           1998  |   2.372269   3.526131     0.67   0.502    -4.572577    9.317114
           1999  |    3.28253   3.545548     0.93   0.355    -3.700559    10.26562
           2000  |   1.610422   3.667929     0.44   0.661    -5.613699    8.834542
           2001  |   1.132951   3.950361     0.29   0.775     -6.64743    8.913333
           2002  |   2.982771   3.897317     0.77   0.445    -4.693139    10.65868
           2003  |   5.875035   4.156226     1.41   0.159    -2.310805    14.06087
           2004  |   7.411438    4.16802     1.78   0.077     -.797632    15.62051
           2005  |    9.27638   4.416921     2.10   0.037     .5770927    17.97567
           2006  |    9.67896    4.51858     2.14   0.033     .7794497    18.57847
           2007  |   11.31239   4.514173     2.51   0.013     2.421557    20.20322
           2008  |   13.03319   4.787008     2.72   0.007     3.605004    22.46138
           2009  |   12.77196   4.616242     2.77   0.006     3.680102    21.86382
           2010  |   13.84908   4.783424     2.90   0.004     4.427947    23.27021
           2011  |   16.38837     5.0108     3.27   0.001     6.519414    26.25732
           2012  |    17.7978   5.269441     3.38   0.001     7.419443    28.17616
           2013  |   18.41073   5.232487     3.52   0.001      8.10515     28.7163
           2014  |   17.87649   5.119939     3.49   0.001     7.792578     27.9604
           2015  |   17.18381   5.040096     3.41   0.001     7.257156    27.11047
           2016  |   17.83357   5.139675     3.47   0.001     7.710794    27.95635
           2017  |   17.74431   5.258003     3.37   0.001     7.388476    28.10014
           2018  |   18.12558   5.335541     3.40   0.001     7.617031    28.63412
           2019  |   18.08194   5.381563     3.36   0.001     7.482753    28.68113
           2020  |   18.08149    5.39998     3.35   0.001     7.446035    28.71695
           2021  |   19.67737   5.594736     3.52   0.001     8.658328    30.69641
           2022  |   19.32571   5.691992     3.40   0.001     8.115118    30.53629
           2023  |   20.12481   5.920155     3.40   0.001     8.464843    31.78477
                 |
          lead15 |  -10.43364    5.68993    -1.83   0.068    -21.64016    .7728907
          lead14 |  -1.922303   10.17528    -0.19   0.850    -21.96288    18.11828
          lead13 |    1.21294    12.5336     0.10   0.923    -23.47244    25.89832
          lead12 |   3.075636   10.75358     0.29   0.775    -18.10393     24.2552
          lead11 |   .1882278   9.933739     0.02   0.985    -19.37664    19.75309
          lead10 |  -3.206425   8.839438    -0.36   0.717    -20.61603    14.20317
           lead9 |  -4.098672   7.901657    -0.52   0.604    -19.66128    11.46393
           lead8 |  -4.961981   6.543385    -0.76   0.449    -17.84942    7.925457
           lead7 |  -2.957679   5.069235    -0.58   0.560    -12.94172    7.026367
           lead6 |  -2.196207   4.942476    -0.44   0.657     -11.9306    7.538181
           lead5 |  -1.869734   3.536511    -0.53   0.597    -8.835022    5.095554
           lead4 |   -1.53696   2.691031    -0.57   0.568    -6.837045    3.763126
           lead3 |  -.3714522   2.162117    -0.17   0.864    -4.629822    3.886917
           lead2 |  -.6900516   .9966668    -0.69   0.489    -2.653024     1.27292
            lag0 |   7.462517   1.823163     4.09   0.000      3.87173     11.0533
            lag1 |   2.940426   1.411938     2.08   0.038      .159563    5.721289
            lag2 |   5.449579   1.659522     3.28   0.001      2.18109    8.718068
            lag3 |  -1.579101   6.151922    -0.26   0.798    -13.69554    10.53734
            lag4 |   2.559354    7.83628     0.33   0.744    -12.87449     17.9932
            lag5 |   8.502305   10.69625     0.79   0.427    -12.56436    29.56897
            lag6 |   10.16326   12.04497     0.84   0.400    -13.55975    33.88627
            lag7 |   9.891067   11.91394     0.83   0.407    -13.57388    33.35601
            lag8 |   9.594236   12.03238     0.80   0.426    -14.10397    33.29245
            lag9 |   20.03024    16.5615     1.21   0.228    -12.58824    52.64872
           lag10 |  -5.927529   10.57916    -0.56   0.576    -26.76357    14.90851
           _cons |  -4.640366   4.771616    -0.97   0.332    -14.03824    4.757506
    ------------------------------------------------------------------------------
    Click image for larger version

Name:	eventst.png
Views:	1
Size:	34.2 KB
ID:	1760192


    My questions are:
    1. I thought in the results table only lags and leads should appear, I'm also getting coefficients for the years. Is this normally the case?
    2. I see that only the last leads are significant. Would this mean that, replications have barely an effect on citations? Or did I code something wrongly?
    3. Is the coefficient for replicated relevant, given that this appears to be significant, but the leads aren't?
    4. I believe this command doesn't allow for a Poisson regression. However, since my dependent variable citations is a count non-negative variable, would it be more appropriate to implement a model that includes the Poisson model?
    I hope I'm being clear with my questions. Thank you for your help.

  • #2
    1. The years are year fixed effects. Standard. You can set method to hdfe and absorb time and id to make those go away.
    2. None of the leads are significant (a good thing). Only the first three lags (treatment periods) are significant (as in the picture), so the added citations happen quickly. The point estimates are large later on, but so are the CI.
    3. If you are trying to see what happens after replicated, then replicated should not appear as a regressor. The time2treat variable should only to replicated studies. This will change the results, perhaps for the better.
    4. eventdd can't do Poisson. The coef may not be biased, but the se are presumably wrong. Maybe use the wboot option. You can log the DV to approximate Poisson, but you've got a lot of zeros I suspect. Adding a constant to ln(Y+c) on the dependent variable causes problems.
    HTML Code:
    https://www.jonathandroth.com/assets/files/LogUniqueHOD0_Draft_Accepted.pdf
    You might spend some time studying that issue.

    What you can do is to estimate the model directly rather than using eventdd. It's straightfoward.

    Code:
    use http://www.damianclarke.net/stata/bacon_example.dta , clear
    
    generate t2t = year - _nfd
    
    eventdd asmrs pcinc asmrh cases i.year , timevar(t2t) method(fe, cluster(stfips)) graph_op(ytitle("Suicides per 1m women")  xlabel(-20(5)25))
    
    forv i = 21(-1)1 {
        g led`i' = t2t==-`i'
    }
    
    forv i = 0/27 {
        g lagg`i' = t2t==`i'
    }
    
    xtreg asmrs pcinc asmrh cases led21-led2 lagg0-lagg27 b1964.year, fe cluster(stfips)
    
    poisson asmrs pcinc asmrh cases led21-led2 lagg0-lagg27 b1964.year i.stfips, vce(cluster stfips)
    
    xtpoisson asmrs pcinc asmrh cases led21-led2 lagg0-lagg27 b1964.year , fe vce(robust)  // robust = cluster

    Comment


    • #3
      It will save the dummies, but you have to save the file just before.

      Producing the graph will take some effort, though you can probably borrow code from eventdd.

      I tried to add poisson as an option in eventdd. It would estimate, but only partly (produces all the coefficients). It would take some time to figure out, but I think it's the additional stats at the end aren't being produces by xtpoisson. What gets graphed would take some thinking.

      Code:
      use http://www.damianclarke.net/stata/bacon_example.dta , clear
      
      generate t2t = year - _nfd
      
      save bacon2, replace
      eventdd asmrs pcinc asmrh cases i.year , timevar(t2t) method(fe, cluster(stfips)) graph_op(ytitle("Suicides per 1m women")  xlabel(-20(5)25)) keepdummies
      xtreg asmrs pcinc asmrh cases lead21-lead2 lag0-lag27 b1964.year, fe cluster(stfips)
      xtpoisson asmrs pcinc asmrh cases lead21-lead2 lag0-lag27 b1964.year, fe vce(robust)

      Comment


      • #4
        In trying to rewire eventdd (renamed eventdd2), I made these changes:

        Change line 678

        Code:
        local valid_method=inlist("`anything'", "fe", "ols", "hdfe", "")
        
        to
        
        local valid_method=inlist("`anything'", "fe", "ols", "hdfe", "poisson", "")
        dropped this in at line 249:
        Code:
        else if ("`method'"=="poisson")+("`method_old'"=="poisson")>= 1{
            xtpoisson `varlist2' `tot_leads' `tot_lags' `if' `in' `wt' , fe `options' `options_old'
            local estat_wboot=e(cmdline)
        }
        It will estimate the model, but fails here (line 523 or so):

        Code:
        else{
            qui sum `times'
            local minlead  = -r(min)
            local maxlag   = r(max)
            matrix vll = v["lead`minlead'".."lag`maxlag'", "lead`minlead'".."lag`maxlag'"]
        }
        `times' has all the values, so I'm not sure why it is failing.

        It does produce the graph, but without confidence bands.

        You'll need to change the name to eventdd2 at lines 5,6. then run it.

        This new code works with the normal options, but it failing with the poisson. So it is related to what xtpoisson is doing. (also tried poisson, but it failed too).
        Last edited by George Ford; 30 Jul 2024, 15:48.

        Comment


        • #5
          This is purely mechanical. I'm not sure whether this is legit, but with Poisson I'd think it would be ok.

          Comment


          • #6
            Hi George, thank you very much for your answer.

            I wanted to ask some additional things based on what I implemented after your suggestions. I run the model once again with different specifications, taking out "replicated" as a regressor, and including other (time-constant) controls:

            1. OLS with clusters:
            Code:
            evntdd citations length n_authors m_share h_index i.year, timevar(timeToTreat) method(,cluster(P_id)) graph_op(ytitle("Citations per Year"))
            2. HDFE (year and paper fe):
            Code:
            eventdd citations length n_authors m_share h_index, timevar(timeToTreat) method(hdfe,absorb(i.P_id i.year)) graph_op(ytitle("Citations per Year"))
            3. FE:
            Code:
            eventdd citations length n_authors m_share h_index i.year, timevar(timeToTreat) method(fe, cluster(P_id)) graph_op(ytitle("Citations per Year"))

            As you can see, in all 3 models, after taking replication out, now the effect is significant before the event. Which I think for my case, it could be that: either the citations start increasing maybe when the working papers of those replications start circulating (which is before when the official replications are published), or just because very famous papers just attract citations and replications, and therefore start increasing.

            My issue is, if I implement model 1 (eventdd OLS), my CIs are very big from lag 6 onwards. Then, for models 2 & 3, the effect on the leads is just completely misleading.

            I was told I have a power issue, and that's why I probably get such wide CIs. I was suggested that, instead of taking each year separately, I could "group" the years, let's say interval runs from τ = 1 to τ = 3 (or try different intervals).

            For that I tried this:
            Code:
            generate timeToTreat5 = floor((timeToTreat + 2) / 5)
            eventdd citations replicated length n_authors m_share h_index i.year, timevar(timeToTreat5) method(, cluster(P_id)) graph_op(ytitle("Citations per Year"))
            The CIs do get a bit smaller (but still wide), but now none of my leads nor lags are significant.

            Before I try to do the Poisson model I thought one of this models should show something more promising but neither of them seem logical to me. Could this be only because of my data? Or am I doing something completely wrong with the specification?

            Comment


            • #7
              Leave out the X's for now.

              Code:
               egen pid = group(paper_id)
              g t2t = year - rep_year  
              save try_data, replace
              eventdd citations , timevar(t2t) method(hdfe, absorb(pid year) cluster(pid)) keepdummies leads(4) lags(10) inrange  
              
              reghdfe citations lead* lag* , absorb(pid year) cluster(pid)  
              testparm lead4 lead3 lead2
              coefplot , drop(lead10 lead9 lead8 lead7 lead6 lead5 _cons) vertical xline(4, lp(dot)) yline(0)  
              
              ppmlhdfe citations lead* lag* , absorb(pid year) cluster(pid)  
              testparm lead4 lead3 lead2
              coefplot , drop(lead10 lead9 lead8 lead7 lead6 lead5 _cons) vertical xline(4, lp(dot)) yline(0)
              Last edited by George Ford; 11 Aug 2024, 09:05.

              Comment

              Working...
              X