Hello everybody,
first of all, I want to thank you for your contributions to this forum. I am currently working on my master thesis. At the beginning of this project, I had no idea about stata. Thanks to your Input in the Forum, I managed to tackle quite a few obstacles along the way. However, now I need some help, since some things have gotten way out of my are of knowledge. Apologize my english, I try my best, but it is not my first language. I will try my best to give you precise information about my situation, so I don't waste your time.
In my Thesis, I look at the effect of a financial Funding program for scientists on three things: The number of articles published by the scientists, the number of citations of these articles and whether or not the scientists manage to get Tenure.
I have a Paneldata set for originally 7878 Persons, 78 of them are treated (they got the funding) and 7800 are controls. I have the Data for the years between 1996 and 2015. It is important to note that funding takes place once a year between 2004 and 2012 and lasts for one year, so I have 9 Treatment times. One person can only get funding once.
I run Stata 13.1.
Below is an excerpt of my original data:
My analysis in general is divided into two parts: The first part being the analysis of the number of publications and citations and the second part being the analysis of having Tenure. Therefore I "split" my Dataset during the Matching:
First, I run coarsened exact matching. For the first part of my analysis, I constrict myself to 1:50 Matching. I had some difficulties getting this to work because of the different treatment times, but that is done now. For the second part of my analysis, I constrict myself to 1:2 Matching, since I have to research the Tenure-Information for the matched Persons by hand. So, after Matching, I add the variables Tenure_1996 - Tenure_2015, where 1 = person has a job with Tenure in this period and 0 = Person does not have tenure in this period.
For both parts of my analysis, I want to run a difference in differences analysis for the # of Publications and the # of Citations and having tenure. I use the specification made by
"Jaravel, X., Petkova, N., & Bell, A. (2018). Team-specific capital and innovation. American Economic Review, 108(4-5), 1034-73." on page 1048. They include a DummyReal, which turns to 1 after the treatment for the treated and a DummyAll, which turns to 1 after the treatment for both treated and controls. The effect of the treatment is the coefficient for DummyReal. I adjust the formula for fixed effects, since I do not include the interaction term between year and individual fixed effects, but both effects on their own (see below).
For the first part of my analysis, I reshape my data and mark it as panel data using
Then I use a poisson regression since I have Count data for both the # of Articles and # of Citations. I want to include age-fixed, year-fixed and individual fixed effects and use robust Standarderrors. Note: Jahr = year, Alter = age (different language, sorry). Age/Alter denotes the "career-age" of a scientist, that is the number of years since his first publication. It is categorical with 10 year intervals, which means Alter = 2 means the career-age is between 10 and 20 years. weights is the weight given to each treated and control because of the coarsened exact matching (I adjusted it after restricting myself to 1:50 and 1:2, respectively). Here is my code:
First question: Do you think my code for both regressions is ok?
Here is part of my output for the citations. I am using the "irr" option to interpret the results:
Second question: Am I coreectly interpretating, that Treated persons compared to untreated have a number of citations that is 0,65 Percent worse than the untreated Persons? I am struggling here since I am not quite sure how to interpret the irr with regards to the difference-in-differences design and the definition of DummyReal by Jaravel.
-----------
For the second part of my analysis (Tenure), I also reshape my data and mark it as panel data using
Then I use a logit Regression since my outcome variable is binary. I want to include the same fixed effects as above. This leads me to the following code:
Here is the output I get for this regression:
My Problem is that I am not sure how to interpret these results. I searched through the forum and found some excellent posts regarding the problems of using the margins command for clogit with fixed effects. I wanted to use the aextlogit command, but I get a coefficient value greater than 1 for DummyReal, which makes no sense.
Third Question: Is there any help you can give me, to be able to interpret these results better? What else could I do or should I just stick to the interpretation, that treatment has a positive Effect on having Tenure?
Thank you guys a lot in advance for your help. I hope I made everything clear for you. If not, sorry and I will do my best to answer as quickly as I can.
Best regards!
first of all, I want to thank you for your contributions to this forum. I am currently working on my master thesis. At the beginning of this project, I had no idea about stata. Thanks to your Input in the Forum, I managed to tackle quite a few obstacles along the way. However, now I need some help, since some things have gotten way out of my are of knowledge. Apologize my english, I try my best, but it is not my first language. I will try my best to give you precise information about my situation, so I don't waste your time.
In my Thesis, I look at the effect of a financial Funding program for scientists on three things: The number of articles published by the scientists, the number of citations of these articles and whether or not the scientists manage to get Tenure.
I have a Paneldata set for originally 7878 Persons, 78 of them are treated (they got the funding) and 7800 are controls. I have the Data for the years between 1996 and 2015. It is important to note that funding takes place once a year between 2004 and 2012 and lasts for one year, so I have 9 Treatment times. One person can only get funding once.
I run Stata 13.1.
Below is an excerpt of my original data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(ID Förderjahr) byte(Sub_Life Sub_Health Sub_Physical Sub_Social) int PubRangeStart byte(ArtCount_1996 ArtCount_2004) int ArtCount_2015 float CiteCount_1996 int(CiteCount_2004 CiteCount_2015) float treated 1940 2004 0 0 1 0 2002 0 0 0 0 0 0 0 1189 2004 0 0 1 0 2003 0 0 0 0 0 0 0 1823 2004 0 0 1 0 2000 0 0 0 0 0 0 0 1985 2004 0 0 1 0 2003 0 0 0 0 0 0 0 2237 2004 0 0 1 0 1997 0 0 1 0 0 0 0 end label values ID ID label def ID 1189 "55480010900", modify label def ID 1823 "6506679415", modify label def ID 1940 "6602405762", modify label def ID 1985 "6603812651", modify label def ID 2237 "8268303400", modify label values Förderjahr Förderjahr
My analysis in general is divided into two parts: The first part being the analysis of the number of publications and citations and the second part being the analysis of having Tenure. Therefore I "split" my Dataset during the Matching:
First, I run coarsened exact matching. For the first part of my analysis, I constrict myself to 1:50 Matching. I had some difficulties getting this to work because of the different treatment times, but that is done now. For the second part of my analysis, I constrict myself to 1:2 Matching, since I have to research the Tenure-Information for the matched Persons by hand. So, after Matching, I add the variables Tenure_1996 - Tenure_2015, where 1 = person has a job with Tenure in this period and 0 = Person does not have tenure in this period.
For both parts of my analysis, I want to run a difference in differences analysis for the # of Publications and the # of Citations and having tenure. I use the specification made by
"Jaravel, X., Petkova, N., & Bell, A. (2018). Team-specific capital and innovation. American Economic Review, 108(4-5), 1034-73." on page 1048. They include a DummyReal, which turns to 1 after the treatment for the treated and a DummyAll, which turns to 1 after the treatment for both treated and controls. The effect of the treatment is the coefficient for DummyReal. I adjust the formula for fixed effects, since I do not include the interaction term between year and individual fixed effects, but both effects on their own (see below).
For the first part of my analysis, I reshape my data and mark it as panel data using
Code:
reshape long ArtCount_ CiteCount_, i(ID) j(Jahr) xtset ID Jahr
Code:
xtpoisson ArtCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) irr xtpoisson CiteCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) irr
Here is part of my output for the citations. I am using the "irr" option to interpret the results:
Code:
. xtpoisson CiteCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) note: 951 groups (10261 obs) dropped because of all zero outcomes Iteration 0: log pseudolikelihood = -229837.47 Iteration 1: log pseudolikelihood = -203398.87 Iteration 2: log pseudolikelihood = -202098.39 Iteration 3: log pseudolikelihood = -202066.86 Iteration 4: log pseudolikelihood = -202066.67 Iteration 5: log pseudolikelihood = -202066.67 Conditional fixed-effects Poisson regression Number of obs = 14998 Group variable: ID Number of groups = 1392 Obs per group: min = 9 avg = 10.8 max = 11 Wald chi2(21) = 3964.54 Log pseudolikelihood = -202066.67 Prob > chi2 = 0.0000 (Std. Err. adjusted for clustering on ID) ------------------------------------------------------------------------------ | Robust CiteCount_ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- DummyReal | .6479992 .19778 3.28 0.001 .2603576 1.035641 DummyAll | -1.110034 .042649 -26.03 0.000 -1.193624 -1.026443 | Jahr | 2000 | -1.289927 .3651744 -3.53 0.000 -2.005656 -.5741984 2001 | -.7900851 .2966834 -2.66 0.008 -1.371574 -.2085964 2002 | -.3793931 .2921996 -1.30 0.194 -.9520938 .1933075 2003 | -.1417532 .2837028 -0.50 0.617 -.6978006 .4142941 2004 | .2008781 .2807665 0.72 0.474 -.3494142 .7511704 2005 | .1312249 .2811633 0.47 0.641 -.4198449 .6822948 2006 | -.3939565 .283458 -1.39 0.165 -.9495241 .161611 2007 | .726281 .2811001 2.58 0.010 .175335 1.277227 2008 | .3884179 .283314 1.37 0.170 -.1668674 .9437032 2009 | .5360209 .2834883 1.89 0.059 -.0196061 1.091648 2010 | .7834782 .2984399 2.63 0.009 .1985466 1.36841 2011 | .7296166 .290832 2.51 0.012 .1595964 1.299637 2012 | .1562271 .2922513 0.53 0.593 -.4165749 .7290291 2013 | .1695013 .3025355 0.56 0.575 -.4234575 .7624601 2014 | -.5033082 .3087599 -1.63 0.103 -1.108466 .1018501 2015 | -1.069772 .3037923 -3.52 0.000 -1.665194 -.4743499 | Alter | 1 | .1143365 .0527683 2.17 0.030 .0109125 .2177605 2 | .2474 .1783992 1.39 0.166 -.102256 .597056 3 | .3351324 .1624508 2.06 0.039 .0167348 .65353 ------------------------------------------------------------------------------
Second question: Am I coreectly interpretating, that Treated persons compared to untreated have a number of citations that is 0,65 Percent worse than the untreated Persons? I am struggling here since I am not quite sure how to interpret the irr with regards to the difference-in-differences design and the definition of DummyReal by Jaravel.
-----------
For the second part of my analysis (Tenure), I also reshape my data and mark it as panel data using
Code:
reshape long Tenure_, i(ID) j(Jahr) xtset ID Jahr
Code:
*Creating a macro for the year dummies forvalues i = 1996(1)2015 { local JahrDummys `JahrDummys' JahrDummy`i' } *Creating a macro for the age dummies forvalues i = 0(1)3 { local AlterDummys `AlterDummys' AlterDummy`i' } *Regression for Tenure clogit Tenure_ DummyReal DummyAll `JahrDummys' `AlterDummys' [iweight = weights], group(ID) vce(robust)
Code:
note: JahrDummy2015 omitted because of collinearity note: AlterDummy3 omitted because of collinearity note: multiple positive outcomes within groups encountered. note: 107 groups (2140 obs) dropped because of all positive or all negative outcomes. Iteration 0: log pseudolikelihood = -195.918 Iteration 1: log pseudolikelihood = -114.3456 Iteration 2: log pseudolikelihood = -101.83639 Iteration 3: log pseudolikelihood = -99.917826 Iteration 4: log pseudolikelihood = -99.636713 Iteration 5: log pseudolikelihood = -99.569896 Iteration 6: log pseudolikelihood = -99.557089 Iteration 7: log pseudolikelihood = -99.555031 Iteration 8: log pseudolikelihood = -99.554527 Iteration 9: log pseudolikelihood = -99.554423 Iteration 10: log pseudolikelihood = -99.554401 Iteration 11: log pseudolikelihood = -99.554396 Conditional (fixed-effects) logistic regression Number of obs = 1100 Wald chi2(24) = 5892735.82 Prob > chi2 = 0.0000 Log pseudolikelihood = -99.554396 Pseudo R2 = 0.8206 (Std. Err. adjusted for clustering on ID) ------------------------------------------------------------------------------- | Robust Tenure_ | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- DummyReal | 15.74403 1.079024 14.59 0.000 13.62918 17.85888 DummyAll | .7220081 .9697854 0.74 0.457 -1.178736 2.622753 JahrDummy1996 | -15.39999 5.159302 -2.98 0.003 -25.51204 -5.287945 JahrDummy1997 | -15.64311 5.125531 -3.05 0.002 -25.68897 -5.597253 JahrDummy1998 | -16.00921 5.06794 -3.16 0.002 -25.94219 -6.076232 JahrDummy1999 | -13.98576 4.843719 -2.89 0.004 -23.47928 -4.49225 JahrDummy2000 | -13.19993 4.650074 -2.84 0.005 -22.31391 -4.085955 JahrDummy2001 | -13.3446 4.651334 -2.87 0.004 -22.46104 -4.228151 JahrDummy2002 | -12.63749 4.520511 -2.80 0.005 -21.49753 -3.777456 JahrDummy2003 | -11.1608 4.461463 -2.50 0.012 -19.90511 -2.416496 JahrDummy2004 | -10.61971 4.420429 -2.40 0.016 -19.2836 -1.955833 JahrDummy2005 | -10.47379 4.541734 -2.31 0.021 -19.37543 -1.572158 JahrDummy2006 | -10.729 4.448356 -2.41 0.016 -19.44762 -2.010386 JahrDummy2007 | -9.759413 4.186473 -2.33 0.020 -17.96475 -1.554076 JahrDummy2008 | -9.213294 3.922282 -2.35 0.019 -16.90083 -1.525763 JahrDummy2009 | -8.063055 3.616174 -2.23 0.026 -15.15063 -.9754846 JahrDummy2010 | -6.441235 2.870035 -2.24 0.025 -12.0664 -.8160707 JahrDummy2011 | -4.453233 2.259879 -1.97 0.049 -8.882514 -.0239515 JahrDummy2012 | -3.397512 1.727395 -1.97 0.049 -6.783143 -.0118808 JahrDummy2013 | -1.874335 1.419931 -1.32 0.187 -4.657349 .9086792 JahrDummy2014 | -.6120955 .9763561 -0.63 0.531 -2.525718 1.301527 JahrDummy2015 | 0 (omitted) AlterDummy0 | 9.14958 3.342895 2.74 0.006 2.597627 15.70153 AlterDummy1 | 11.50867 3.453237 3.33 0.001 4.74045 18.27689 AlterDummy2 | 10.82984 3.416407 3.17 0.002 4.133804 17.52587 AlterDummy3 | 0 (omitted) -------------------------------------------------------------------------------
My Problem is that I am not sure how to interpret these results. I searched through the forum and found some excellent posts regarding the problems of using the margins command for clogit with fixed effects. I wanted to use the aextlogit command, but I get a coefficient value greater than 1 for DummyReal, which makes no sense.
Third Question: Is there any help you can give me, to be able to interpret these results better? What else could I do or should I just stick to the interpretation, that treatment has a positive Effect on having Tenure?
Thank you guys a lot in advance for your help. I hope I made everything clear for you. If not, sorry and I will do my best to answer as quickly as I can.
Best regards!