Difference in Difference with logit and poisson Interpretation

Philipp Machauer

Join Date: Mar 2020
Posts: 3

Difference in Difference with logit and poisson Interpretation

17 Mar 2020, 14:30

Hello everybody,

first of all, I want to thank you for your contributions to this forum. I am currently working on my master thesis. At the beginning of this project, I had no idea about stata. Thanks to your Input in the Forum, I managed to tackle quite a few obstacles along the way. However, now I need some help, since some things have gotten way out of my are of knowledge. Apologize my english, I try my best, but it is not my first language. I will try my best to give you precise information about my situation, so I don't waste your time.

In my Thesis, I look at the effect of a financial Funding program for scientists on three things: The number of articles published by the scientists, the number of citations of these articles and whether or not the scientists manage to get Tenure.

I have a Paneldata set for originally 7878 Persons, 78 of them are treated (they got the funding) and 7800 are controls. I have the Data for the years between 1996 and 2015. It is important to note that funding takes place once a year between 2004 and 2012 and lasts for one year, so I have 9 Treatment times. One person can only get funding once.

I run Stata 13.1.

Below is an excerpt of my original data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(ID Förderjahr) byte(Sub_Life Sub_Health Sub_Physical Sub_Social) int PubRangeStart byte(ArtCount_1996 ArtCount_2004) int ArtCount_2015 float CiteCount_1996 int(CiteCount_2004 CiteCount_2015) float treated
1940 2004 0 0 1 0 2002 0 0 0 0 0 0 0
1189 2004 0 0 1 0 2003 0 0 0 0 0 0 0
1823 2004 0 0 1 0 2000 0 0 0 0 0 0 0
1985 2004 0 0 1 0 2003 0 0 0 0 0 0 0
2237 2004 0 0 1 0 1997 0 0 1 0 0 0 0
end
label values ID ID
label def ID 1189 "55480010900", modify
label def ID 1823 "6506679415", modify
label def ID 1940 "6602405762", modify
label def ID 1985 "6603812651", modify
label def ID 2237 "8268303400", modify
label values Förderjahr Förderjahr

My analysis in general is divided into two parts: The first part being the analysis of the number of publications and citations and the second part being the analysis of having Tenure. Therefore I "split" my Dataset during the Matching:

First, I run coarsened exact matching. For the first part of my analysis, I constrict myself to 1:50 Matching. I had some difficulties getting this to work because of the different treatment times, but that is done now. For the second part of my analysis, I constrict myself to 1:2 Matching, since I have to research the Tenure-Information for the matched Persons by hand. So, after Matching, I add the variables Tenure_1996 - Tenure_2015, where 1 = person has a job with Tenure in this period and 0 = Person does not have tenure in this period.

For both parts of my analysis, I want to run a difference in differences analysis for the # of Publications and the # of Citations and having tenure. I use the specification made by
"Jaravel, X., Petkova, N., & Bell, A. (2018). Team-specific capital and innovation. American Economic Review, 108(4-5), 1034-73." on page 1048. They include a DummyReal, which turns to 1 after the treatment for the treated and a DummyAll, which turns to 1 after the treatment for both treated and controls. The effect of the treatment is the coefficient for DummyReal. I adjust the formula for fixed effects, since I do not include the interaction term between year and individual fixed effects, but both effects on their own (see below).

For the first part of my analysis, I reshape my data and mark it as panel data using

Code:

reshape long ArtCount_ CiteCount_, i(ID) j(Jahr)
xtset ID Jahr

Then I use a poisson regression since I have Count data for both the # of Articles and # of Citations. I want to include age-fixed, year-fixed and individual fixed effects and use robust Standarderrors. Note: Jahr = year, Alter = age (different language, sorry). Age/Alter denotes the "career-age" of a scientist, that is the number of years since his first publication. It is categorical with 10 year intervals, which means Alter = 2 means the career-age is between 10 and 20 years. weights is the weight given to each treated and control because of the coarsened exact matching (I adjusted it after restricting myself to 1:50 and 1:2, respectively). Here is my code:

Code:

xtpoisson ArtCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) irr
xtpoisson CiteCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) irr

First question: Do you think my code for both regressions is ok?

Here is part of my output for the citations. I am using the "irr" option to interpret the results:

Code:

. xtpoisson CiteCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust)
note: 951 groups (10261 obs) dropped because of all zero outcomes

Iteration 0:   log pseudolikelihood = -229837.47  
Iteration 1:   log pseudolikelihood = -203398.87  
Iteration 2:   log pseudolikelihood = -202098.39  
Iteration 3:   log pseudolikelihood = -202066.86  
Iteration 4:   log pseudolikelihood = -202066.67  
Iteration 5:   log pseudolikelihood = -202066.67  

Conditional fixed-effects Poisson regression    Number of obs      =     14998
Group variable: ID                              Number of groups   =      1392

                                                Obs per group: min =         9
                                                               avg =      10.8
                                                               max =        11

                                                Wald chi2(21)      =   3964.54
Log pseudolikelihood  = -202066.67              Prob &gt; chi2        =    0.0000

                                     (Std. Err. adjusted for clustering on ID)
------------------------------------------------------------------------------
             |               Robust
  CiteCount_ |      Coef.   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   DummyReal |   .6479992     .19778     3.28   0.001     .2603576    1.035641
    DummyAll |  -1.110034    .042649   -26.03   0.000    -1.193624   -1.026443
             |
        Jahr |
       2000  |  -1.289927   .3651744    -3.53   0.000    -2.005656   -.5741984
       2001  |  -.7900851   .2966834    -2.66   0.008    -1.371574   -.2085964
       2002  |  -.3793931   .2921996    -1.30   0.194    -.9520938    .1933075
       2003  |  -.1417532   .2837028    -0.50   0.617    -.6978006    .4142941
       2004  |   .2008781   .2807665     0.72   0.474    -.3494142    .7511704
       2005  |   .1312249   .2811633     0.47   0.641    -.4198449    .6822948
       2006  |  -.3939565    .283458    -1.39   0.165    -.9495241     .161611
       2007  |    .726281   .2811001     2.58   0.010      .175335    1.277227
       2008  |   .3884179    .283314     1.37   0.170    -.1668674    .9437032
       2009  |   .5360209   .2834883     1.89   0.059    -.0196061    1.091648
       2010  |   .7834782   .2984399     2.63   0.009     .1985466     1.36841
       2011  |   .7296166    .290832     2.51   0.012     .1595964    1.299637
       2012  |   .1562271   .2922513     0.53   0.593    -.4165749    .7290291
       2013  |   .1695013   .3025355     0.56   0.575    -.4234575    .7624601
       2014  |  -.5033082   .3087599    -1.63   0.103    -1.108466    .1018501
       2015  |  -1.069772   .3037923    -3.52   0.000    -1.665194   -.4743499
             |
       Alter |
          1  |   .1143365   .0527683     2.17   0.030     .0109125    .2177605
          2  |      .2474   .1783992     1.39   0.166     -.102256     .597056
          3  |   .3351324   .1624508     2.06   0.039     .0167348      .65353
------------------------------------------------------------------------------

Second question: Am I coreectly interpretating, that Treated persons compared to untreated have a number of citations that is 0,65 Percent worse than the untreated Persons? I am struggling here since I am not quite sure how to interpret the irr with regards to the difference-in-differences design and the definition of DummyReal by Jaravel.

-----------

For the second part of my analysis (Tenure), I also reshape my data and mark it as panel data using

Code:

reshape long Tenure_, i(ID) j(Jahr)
xtset ID Jahr

Then I use a logit Regression since my outcome variable is binary. I want to include the same fixed effects as above. This leads me to the following code:

Code:

*Creating a macro for the year dummies
forvalues i = 1996(1)2015 {
local JahrDummys `JahrDummys' JahrDummy`i'
}

*Creating a macro for the age dummies
forvalues i = 0(1)3 {
local AlterDummys `AlterDummys' AlterDummy`i'
}

*Regression for Tenure
clogit Tenure_ DummyReal DummyAll `JahrDummys' `AlterDummys' [iweight = weights], group(ID) vce(robust)

Here is the output I get for this regression:

Code:

note: JahrDummy2015 omitted because of collinearity
note: AlterDummy3 omitted because of collinearity
note: multiple positive outcomes within groups encountered.
note: 107 groups (2140 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log pseudolikelihood =   -195.918  
Iteration 1:   log pseudolikelihood =  -114.3456  
Iteration 2:   log pseudolikelihood = -101.83639  
Iteration 3:   log pseudolikelihood = -99.917826  
Iteration 4:   log pseudolikelihood = -99.636713  
Iteration 5:   log pseudolikelihood = -99.569896  
Iteration 6:   log pseudolikelihood = -99.557089  
Iteration 7:   log pseudolikelihood = -99.555031  
Iteration 8:   log pseudolikelihood = -99.554527  
Iteration 9:   log pseudolikelihood = -99.554423  
Iteration 10:  log pseudolikelihood = -99.554401  
Iteration 11:  log pseudolikelihood = -99.554396  

Conditional (fixed-effects) logistic regression   Number of obs   =       1100
                                                  Wald chi2(24)   = 5892735.82
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -99.554396                 Pseudo R2       =     0.8206

                                      (Std. Err. adjusted for clustering on ID)
-------------------------------------------------------------------------------
              |               Robust
      Tenure_ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    DummyReal |   15.74403   1.079024    14.59   0.000     13.62918    17.85888
     DummyAll |   .7220081   .9697854     0.74   0.457    -1.178736    2.622753
JahrDummy1996 |  -15.39999   5.159302    -2.98   0.003    -25.51204   -5.287945
JahrDummy1997 |  -15.64311   5.125531    -3.05   0.002    -25.68897   -5.597253
JahrDummy1998 |  -16.00921    5.06794    -3.16   0.002    -25.94219   -6.076232
JahrDummy1999 |  -13.98576   4.843719    -2.89   0.004    -23.47928    -4.49225
JahrDummy2000 |  -13.19993   4.650074    -2.84   0.005    -22.31391   -4.085955
JahrDummy2001 |   -13.3446   4.651334    -2.87   0.004    -22.46104   -4.228151
JahrDummy2002 |  -12.63749   4.520511    -2.80   0.005    -21.49753   -3.777456
JahrDummy2003 |   -11.1608   4.461463    -2.50   0.012    -19.90511   -2.416496
JahrDummy2004 |  -10.61971   4.420429    -2.40   0.016     -19.2836   -1.955833
JahrDummy2005 |  -10.47379   4.541734    -2.31   0.021    -19.37543   -1.572158
JahrDummy2006 |    -10.729   4.448356    -2.41   0.016    -19.44762   -2.010386
JahrDummy2007 |  -9.759413   4.186473    -2.33   0.020    -17.96475   -1.554076
JahrDummy2008 |  -9.213294   3.922282    -2.35   0.019    -16.90083   -1.525763
JahrDummy2009 |  -8.063055   3.616174    -2.23   0.026    -15.15063   -.9754846
JahrDummy2010 |  -6.441235   2.870035    -2.24   0.025     -12.0664   -.8160707
JahrDummy2011 |  -4.453233   2.259879    -1.97   0.049    -8.882514   -.0239515
JahrDummy2012 |  -3.397512   1.727395    -1.97   0.049    -6.783143   -.0118808
JahrDummy2013 |  -1.874335   1.419931    -1.32   0.187    -4.657349    .9086792
JahrDummy2014 |  -.6120955   .9763561    -0.63   0.531    -2.525718    1.301527
JahrDummy2015 |          0  (omitted)
  AlterDummy0 |    9.14958   3.342895     2.74   0.006     2.597627    15.70153
  AlterDummy1 |   11.50867   3.453237     3.33   0.001      4.74045    18.27689
  AlterDummy2 |   10.82984   3.416407     3.17   0.002     4.133804    17.52587
  AlterDummy3 |          0  (omitted)
-------------------------------------------------------------------------------

My Problem is that I am not sure how to interpret these results. I searched through the forum and found some excellent posts regarding the problems of using the margins command for clogit with fixed effects. I wanted to use the aextlogit command, but I get a coefficient value greater than 1 for DummyReal, which makes no sense.

Third Question: Is there any help you can give me, to be able to interpret these results better? What else could I do or should I just stick to the interpretation, that treatment has a positive Effect on having Tenure?

Thank you guys a lot in advance for your help. I hope I made everything clear for you. If not, sorry and I will do my best to answer as quickly as I can.

Best regards!

Tags: None

Announcement

Difference in Difference with logit and poisson Interpretation