Hi Everyone,
I recently posted a question here, in which I had the same goals and data I'm about to describe below. The reason why I'm creating a new post is because I'm gonna ask about another method with another code.
Goal: I'm trying to see the effect that replications have on the citation counts of papers.
For that, my thesis supervisor suggested me to do a "simple" DID. I implemented a generalized DID suggested in the previous question. However, my supervisor is now suggesting me to implement an Event Study in which I can plot a graph.
My data is an unbalanced panel that looks like something like this:
. dataex paper_id published citations year rep_year replicated length n_authors m_authors h_index in 1111/1200
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Where, paper_id are the papers, replicated is the treatment, and lenght n_authors m_authors h_index are additional characteristics of each paper that don't vary over time for each paper.
As you can see, the panel is unbalanced because some papers can have have 20 years of citations overall (they were published long time ago), and others, 4 years of citations (published recently). My treatment will be those papers that were once replicated (replicated = 1) and the control those that were never replicated (replicated=0).
I came across the eventdd command and I implemented it as such:

My questions are:
I recently posted a question here, in which I had the same goals and data I'm about to describe below. The reason why I'm creating a new post is because I'm gonna ask about another method with another code.
Goal: I'm trying to see the effect that replications have on the citation counts of papers.
For that, my thesis supervisor suggested me to do a "simple" DID. I implemented a generalized DID suggested in the previous question. However, my supervisor is now suggesting me to implement an Event Study in which I can plot a graph.
My data is an unbalanced panel that looks like something like this:
. dataex paper_id published citations year rep_year replicated length n_authors m_authors h_index in 1111/1200
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str19 paper_id int(published citations year rep_year) float replicated int length byte(n_authors m_authors) int h_index "Cassar_2013" 2013 11 2016 2023 1 34 3 1 92 "Cassar_2013" 2013 11 2017 2023 1 34 3 1 92 "Cassar_2013" 2013 22 2018 2023 1 34 3 1 92 "Cassar_2013" 2013 12 2019 2023 1 34 3 1 92 "Cassar_2013" 2013 18 2020 2023 1 34 3 1 92 "Cassar_2013" 2013 16 2021 2023 1 34 3 1 92 "Cassar_2013" 2013 16 2022 2023 1 34 3 1 92 "Cassar_2013" 2013 30 2023 2023 1 34 3 1 92 "Catao_2005" 2005 0 2005 . 0 17 2 2 399 "Catao_2005" 2005 1 2006 . 0 17 2 2 399 "Catao_2005" 2005 2 2007 . 0 17 2 2 399 "Catao_2005" 2005 0 2008 . 0 17 2 2 399 "Catao_2005" 2005 3 2009 . 0 17 2 2 399 "Catao_2005" 2005 6 2010 . 0 17 2 2 399 "Catao_2005" 2005 1 2011 . 0 17 2 2 399 "Catao_2005" 2005 3 2012 . 0 17 2 2 399 "Catao_2005" 2005 2 2013 . 0 17 2 2 399 "Catao_2005" 2005 0 2014 . 0 17 2 2 399 "Catao_2005" 2005 1 2015 . 0 17 2 2 399 "Catao_2005" 2005 7 2016 . 0 17 2 2 399 "Catao_2005" 2005 3 2017 . 0 17 2 2 399 "Catao_2005" 2005 4 2018 . 0 17 2 2 399 "Catao_2005" 2005 5 2019 . 0 17 2 2 399 "Catao_2005" 2005 2 2020 . 0 17 2 2 399 "Catao_2005" 2005 0 2021 . 0 17 2 2 399 "Catao_2005" 2005 2 2022 . 0 17 2 2 399 "Catao_2005" 2005 0 2023 . 0 17 2 2 399 "Cattaneo_2009" 2009 0 2009 2019 1 31 5 4 83 "Cattaneo_2009" 2009 4 2010 2019 1 31 5 4 83 "Cattaneo_2009" 2009 4 2011 2019 1 31 5 4 83 "Cattaneo_2009" 2009 4 2012 2019 1 31 5 4 83 "Cattaneo_2009" 2009 4 2013 2019 1 31 5 4 83 "Cattaneo_2009" 2009 6 2014 2019 1 31 5 4 83 "Cattaneo_2009" 2009 6 2015 2019 1 31 5 4 83 "Cattaneo_2009" 2009 10 2016 2019 1 31 5 4 83 "Cattaneo_2009" 2009 7 2017 2019 1 31 5 4 83 "Cattaneo_2009" 2009 12 2018 2019 1 31 5 4 83 "Cattaneo_2009" 2009 10 2019 2019 1 31 5 4 83 "Cattaneo_2009" 2009 15 2020 2019 1 31 5 4 83 "Cattaneo_2009" 2009 16 2021 2019 1 31 5 4 83 "Cattaneo_2009" 2009 13 2022 2019 1 31 5 4 83 "Cattaneo_2009" 2009 13 2023 2019 1 31 5 4 83 "Cerra_2008" 2008 3 2008 2012 1 19 2 0 399 "Cerra_2008" 2008 7 2009 2012 1 19 2 0 399 "Cerra_2008" 2008 18 2010 2012 1 19 2 0 399 "Cerra_2008" 2008 22 2011 2012 1 19 2 0 399 "Cerra_2008" 2008 35 2012 2012 1 19 2 0 399 "Cerra_2008" 2008 30 2013 2012 1 19 2 0 399 "Cerra_2008" 2008 35 2014 2012 1 19 2 0 399 "Cerra_2008" 2008 37 2015 2012 1 19 2 0 399 "Cerra_2008" 2008 35 2016 2012 1 19 2 0 399 "Cerra_2008" 2008 43 2017 2012 1 19 2 0 399 "Cerra_2008" 2008 41 2018 2012 1 19 2 0 399 "Cerra_2008" 2008 36 2019 2012 1 19 2 0 399 "Cerra_2008" 2008 50 2020 2012 1 19 2 0 399 "Cerra_2008" 2008 44 2021 2012 1 19 2 0 399 "Cerra_2008" 2008 44 2022 2012 1 19 2 0 399 "Cerra_2008" 2008 33 2023 2012 1 19 2 0 399 "Cesarini_2009" 2009 9 2009 . 0 34 5 5 337 "Cesarini_2009" 2009 12 2010 . 0 34 5 5 337 "Cesarini_2009" 2009 17 2011 . 0 34 5 5 337 "Cesarini_2009" 2009 16 2012 . 0 34 5 5 337 "Cesarini_2009" 2009 23 2013 . 0 34 5 5 337 "Cesarini_2009" 2009 24 2014 . 0 34 5 5 337 "Cesarini_2009" 2009 25 2015 . 0 34 5 5 337 "Cesarini_2009" 2009 12 2016 . 0 34 5 5 337 "Cesarini_2009" 2009 29 2017 . 0 34 5 5 337 "Cesarini_2009" 2009 22 2018 . 0 34 5 5 337 "Cesarini_2009" 2009 13 2019 . 0 34 5 5 337 "Cesarini_2009" 2009 19 2020 . 0 34 5 5 337 "Cesarini_2009" 2009 14 2021 . 0 34 5 5 337 "Cesarini_2009" 2009 19 2022 . 0 34 5 5 337 "Cesarini_2009" 2009 10 2023 . 0 34 5 5 337 "Chanda_2014" 2014 2 2014 2021 1 28 3 3 94 "Chanda_2014" 2014 1 2015 2021 1 28 3 3 94 "Chanda_2014" 2014 6 2016 2021 1 28 3 3 94 "Chanda_2014" 2014 1 2017 2021 1 28 3 3 94 "Chanda_2014" 2014 4 2018 2021 1 28 3 3 94 "Chanda_2014" 2014 2 2019 2021 1 28 3 3 94 "Chanda_2014" 2014 7 2020 2021 1 28 3 3 94 "Chanda_2014" 2014 9 2021 2021 1 28 3 3 94 "Chanda_2014" 2014 6 2022 2021 1 28 3 3 94 "Chanda_2014" 2014 2 2023 2021 1 28 3 3 94 "Chay_2005" 2005 0 2005 . 0 22 3 3 399 "Chay_2005" 2005 2 2006 . 0 22 3 3 399 "Chay_2005" 2005 4 2007 . 0 22 3 3 399 "Chay_2005" 2005 7 2008 . 0 22 3 3 399 "Chay_2005" 2005 7 2009 . 0 22 3 3 399 "Chay_2005" 2005 11 2010 . 0 22 3 3 399 "Chay_2005" 2005 8 2011 . 0 22 3 3 399 end
Where, paper_id are the papers, replicated is the treatment, and lenght n_authors m_authors h_index are additional characteristics of each paper that don't vary over time for each paper.
As you can see, the panel is unbalanced because some papers can have have 20 years of citations overall (they were published long time ago), and others, 4 years of citations (published recently). My treatment will be those papers that were once replicated (replicated = 1) and the control those that were never replicated (replicated=0).
I came across the eventdd command and I implemented it as such:
Code:
eventdd citations replicated i.year, timevar(timeToTreat) method(,cluster(P_id)) graph_op(ytitle("Citations per Year")) ------------------------------------------------------------------------------ | Robust citations | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- replicated | 23.66014 6.326619 3.74 0.000 11.19963 36.12065 | year | 1992 | .4658272 .7919368 0.59 0.557 -1.093921 2.025576 1993 | .2147516 1.30747 0.16 0.870 -2.360358 2.789862 1994 | -.4756162 1.311588 -0.36 0.717 -3.058837 2.107605 1995 | .6927654 1.600165 0.43 0.665 -2.458819 3.84435 1996 | -.7758596 3.215401 -0.24 0.810 -7.10871 5.55699 1997 | -.8470427 3.359396 -0.25 0.801 -7.463496 5.769411 1998 | 2.372269 3.526131 0.67 0.502 -4.572577 9.317114 1999 | 3.28253 3.545548 0.93 0.355 -3.700559 10.26562 2000 | 1.610422 3.667929 0.44 0.661 -5.613699 8.834542 2001 | 1.132951 3.950361 0.29 0.775 -6.64743 8.913333 2002 | 2.982771 3.897317 0.77 0.445 -4.693139 10.65868 2003 | 5.875035 4.156226 1.41 0.159 -2.310805 14.06087 2004 | 7.411438 4.16802 1.78 0.077 -.797632 15.62051 2005 | 9.27638 4.416921 2.10 0.037 .5770927 17.97567 2006 | 9.67896 4.51858 2.14 0.033 .7794497 18.57847 2007 | 11.31239 4.514173 2.51 0.013 2.421557 20.20322 2008 | 13.03319 4.787008 2.72 0.007 3.605004 22.46138 2009 | 12.77196 4.616242 2.77 0.006 3.680102 21.86382 2010 | 13.84908 4.783424 2.90 0.004 4.427947 23.27021 2011 | 16.38837 5.0108 3.27 0.001 6.519414 26.25732 2012 | 17.7978 5.269441 3.38 0.001 7.419443 28.17616 2013 | 18.41073 5.232487 3.52 0.001 8.10515 28.7163 2014 | 17.87649 5.119939 3.49 0.001 7.792578 27.9604 2015 | 17.18381 5.040096 3.41 0.001 7.257156 27.11047 2016 | 17.83357 5.139675 3.47 0.001 7.710794 27.95635 2017 | 17.74431 5.258003 3.37 0.001 7.388476 28.10014 2018 | 18.12558 5.335541 3.40 0.001 7.617031 28.63412 2019 | 18.08194 5.381563 3.36 0.001 7.482753 28.68113 2020 | 18.08149 5.39998 3.35 0.001 7.446035 28.71695 2021 | 19.67737 5.594736 3.52 0.001 8.658328 30.69641 2022 | 19.32571 5.691992 3.40 0.001 8.115118 30.53629 2023 | 20.12481 5.920155 3.40 0.001 8.464843 31.78477 | lead15 | -10.43364 5.68993 -1.83 0.068 -21.64016 .7728907 lead14 | -1.922303 10.17528 -0.19 0.850 -21.96288 18.11828 lead13 | 1.21294 12.5336 0.10 0.923 -23.47244 25.89832 lead12 | 3.075636 10.75358 0.29 0.775 -18.10393 24.2552 lead11 | .1882278 9.933739 0.02 0.985 -19.37664 19.75309 lead10 | -3.206425 8.839438 -0.36 0.717 -20.61603 14.20317 lead9 | -4.098672 7.901657 -0.52 0.604 -19.66128 11.46393 lead8 | -4.961981 6.543385 -0.76 0.449 -17.84942 7.925457 lead7 | -2.957679 5.069235 -0.58 0.560 -12.94172 7.026367 lead6 | -2.196207 4.942476 -0.44 0.657 -11.9306 7.538181 lead5 | -1.869734 3.536511 -0.53 0.597 -8.835022 5.095554 lead4 | -1.53696 2.691031 -0.57 0.568 -6.837045 3.763126 lead3 | -.3714522 2.162117 -0.17 0.864 -4.629822 3.886917 lead2 | -.6900516 .9966668 -0.69 0.489 -2.653024 1.27292 lag0 | 7.462517 1.823163 4.09 0.000 3.87173 11.0533 lag1 | 2.940426 1.411938 2.08 0.038 .159563 5.721289 lag2 | 5.449579 1.659522 3.28 0.001 2.18109 8.718068 lag3 | -1.579101 6.151922 -0.26 0.798 -13.69554 10.53734 lag4 | 2.559354 7.83628 0.33 0.744 -12.87449 17.9932 lag5 | 8.502305 10.69625 0.79 0.427 -12.56436 29.56897 lag6 | 10.16326 12.04497 0.84 0.400 -13.55975 33.88627 lag7 | 9.891067 11.91394 0.83 0.407 -13.57388 33.35601 lag8 | 9.594236 12.03238 0.80 0.426 -14.10397 33.29245 lag9 | 20.03024 16.5615 1.21 0.228 -12.58824 52.64872 lag10 | -5.927529 10.57916 -0.56 0.576 -26.76357 14.90851 _cons | -4.640366 4.771616 -0.97 0.332 -14.03824 4.757506 ------------------------------------------------------------------------------
My questions are:
- I thought in the results table only lags and leads should appear, I'm also getting coefficients for the years. Is this normally the case?
- I see that only the last leads are significant. Would this mean that, replications have barely an effect on citations? Or did I code something wrongly?
- Is the coefficient for replicated relevant, given that this appears to be significant, but the leads aren't?
- I believe this command doesn't allow for a Poisson regression. However, since my dependent variable citations is a count non-negative variable, would it be more appropriate to implement a model that includes the Poisson model?
Comment