Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference with logit and poisson Interpretation

    Hello everybody,

    first of all, I want to thank you for your contributions to this forum. I am currently working on my master thesis. At the beginning of this project, I had no idea about stata. Thanks to your Input in the Forum, I managed to tackle quite a few obstacles along the way. However, now I need some help, since some things have gotten way out of my are of knowledge. Apologize my english, I try my best, but it is not my first language. I will try my best to give you precise information about my situation, so I don't waste your time.

    In my Thesis, I look at the effect of a financial Funding program for scientists on three things: The number of articles published by the scientists, the number of citations of these articles and whether or not the scientists manage to get Tenure.

    I have a Paneldata set for originally 7878 Persons, 78 of them are treated (they got the funding) and 7800 are controls. I have the Data for the years between 1996 and 2015. It is important to note that funding takes place once a year between 2004 and 2012 and lasts for one year, so I have 9 Treatment times. One person can only get funding once.

    I run Stata 13.1.

    Below is an excerpt of my original data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(ID Förderjahr) byte(Sub_Life Sub_Health Sub_Physical Sub_Social) int PubRangeStart byte(ArtCount_1996 ArtCount_2004) int ArtCount_2015 float CiteCount_1996 int(CiteCount_2004 CiteCount_2015) float treated
    1940 2004 0 0 1 0 2002 0 0 0 0 0 0 0
    1189 2004 0 0 1 0 2003 0 0 0 0 0 0 0
    1823 2004 0 0 1 0 2000 0 0 0 0 0 0 0
    1985 2004 0 0 1 0 2003 0 0 0 0 0 0 0
    2237 2004 0 0 1 0 1997 0 0 1 0 0 0 0
    end
    label values ID ID
    label def ID 1189 "55480010900", modify
    label def ID 1823 "6506679415", modify
    label def ID 1940 "6602405762", modify
    label def ID 1985 "6603812651", modify
    label def ID 2237 "8268303400", modify
    label values Förderjahr Förderjahr

    My analysis in general is divided into two parts: The first part being the analysis of the number of publications and citations and the second part being the analysis of having Tenure. Therefore I "split" my Dataset during the Matching:

    First, I run coarsened exact matching. For the first part of my analysis, I constrict myself to 1:50 Matching. I had some difficulties getting this to work because of the different treatment times, but that is done now. For the second part of my analysis, I constrict myself to 1:2 Matching, since I have to research the Tenure-Information for the matched Persons by hand. So, after Matching, I add the variables Tenure_1996 - Tenure_2015, where 1 = person has a job with Tenure in this period and 0 = Person does not have tenure in this period.

    For both parts of my analysis, I want to run a difference in differences analysis for the # of Publications and the # of Citations and having tenure. I use the specification made by
    "Jaravel, X., Petkova, N., & Bell, A. (2018). Team-specific capital and innovation. American Economic Review, 108(4-5), 1034-73." on page 1048. They include a DummyReal, which turns to 1 after the treatment for the treated and a DummyAll, which turns to 1 after the treatment for both treated and controls. The effect of the treatment is the coefficient for DummyReal. I adjust the formula for fixed effects, since I do not include the interaction term between year and individual fixed effects, but both effects on their own (see below).

    For the first part of my analysis, I reshape my data and mark it as panel data using
    Code:
    reshape long ArtCount_ CiteCount_, i(ID) j(Jahr)
    xtset ID Jahr
    Then I use a poisson regression since I have Count data for both the # of Articles and # of Citations. I want to include age-fixed, year-fixed and individual fixed effects and use robust Standarderrors. Note: Jahr = year, Alter = age (different language, sorry). Age/Alter denotes the "career-age" of a scientist, that is the number of years since his first publication. It is categorical with 10 year intervals, which means Alter = 2 means the career-age is between 10 and 20 years. weights is the weight given to each treated and control because of the coarsened exact matching (I adjusted it after restricting myself to 1:50 and 1:2, respectively). Here is my code:
    Code:
    xtpoisson ArtCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) irr
    xtpoisson CiteCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust) irr
    First question: Do you think my code for both regressions is ok?

    Here is part of my output for the citations. I am using the "irr" option to interpret the results:
    Code:
    . xtpoisson CiteCount_ DummyReal DummyAll i.Jahr i.Alter [iweight = weights], fe vce(robust)
    note: 951 groups (10261 obs) dropped because of all zero outcomes
    
    Iteration 0:   log pseudolikelihood = -229837.47  
    Iteration 1:   log pseudolikelihood = -203398.87  
    Iteration 2:   log pseudolikelihood = -202098.39  
    Iteration 3:   log pseudolikelihood = -202066.86  
    Iteration 4:   log pseudolikelihood = -202066.67  
    Iteration 5:   log pseudolikelihood = -202066.67  
    
    Conditional fixed-effects Poisson regression    Number of obs      =     14998
    Group variable: ID                              Number of groups   =      1392
    
                                                    Obs per group: min =         9
                                                                   avg =      10.8
                                                                   max =        11
    
                                                    Wald chi2(21)      =   3964.54
    Log pseudolikelihood  = -202066.67              Prob > chi2        =    0.0000
    
                                         (Std. Err. adjusted for clustering on ID)
    ------------------------------------------------------------------------------
                 |               Robust
      CiteCount_ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       DummyReal |   .6479992     .19778     3.28   0.001     .2603576    1.035641
        DummyAll |  -1.110034    .042649   -26.03   0.000    -1.193624   -1.026443
                 |
            Jahr |
           2000  |  -1.289927   .3651744    -3.53   0.000    -2.005656   -.5741984
           2001  |  -.7900851   .2966834    -2.66   0.008    -1.371574   -.2085964
           2002  |  -.3793931   .2921996    -1.30   0.194    -.9520938    .1933075
           2003  |  -.1417532   .2837028    -0.50   0.617    -.6978006    .4142941
           2004  |   .2008781   .2807665     0.72   0.474    -.3494142    .7511704
           2005  |   .1312249   .2811633     0.47   0.641    -.4198449    .6822948
           2006  |  -.3939565    .283458    -1.39   0.165    -.9495241     .161611
           2007  |    .726281   .2811001     2.58   0.010      .175335    1.277227
           2008  |   .3884179    .283314     1.37   0.170    -.1668674    .9437032
           2009  |   .5360209   .2834883     1.89   0.059    -.0196061    1.091648
           2010  |   .7834782   .2984399     2.63   0.009     .1985466     1.36841
           2011  |   .7296166    .290832     2.51   0.012     .1595964    1.299637
           2012  |   .1562271   .2922513     0.53   0.593    -.4165749    .7290291
           2013  |   .1695013   .3025355     0.56   0.575    -.4234575    .7624601
           2014  |  -.5033082   .3087599    -1.63   0.103    -1.108466    .1018501
           2015  |  -1.069772   .3037923    -3.52   0.000    -1.665194   -.4743499
                 |
           Alter |
              1  |   .1143365   .0527683     2.17   0.030     .0109125    .2177605
              2  |      .2474   .1783992     1.39   0.166     -.102256     .597056
              3  |   .3351324   .1624508     2.06   0.039     .0167348      .65353
    ------------------------------------------------------------------------------

    Second question: Am I coreectly interpretating, that Treated persons compared to untreated have a number of citations that is 0,65 Percent worse than the untreated Persons? I am struggling here since I am not quite sure how to interpret the irr with regards to the difference-in-differences design and the definition of DummyReal by Jaravel.

    -----------

    For the second part of my analysis (Tenure), I also reshape my data and mark it as panel data using
    Code:
    reshape long Tenure_, i(ID) j(Jahr)
    xtset ID Jahr
    Then I use a logit Regression since my outcome variable is binary. I want to include the same fixed effects as above. This leads me to the following code:
    Code:
    *Creating a macro for the year dummies
    forvalues i = 1996(1)2015 {
    local JahrDummys `JahrDummys' JahrDummy`i'
    }
    
    *Creating a macro for the age dummies
    forvalues i = 0(1)3 {
    local AlterDummys `AlterDummys' AlterDummy`i'
    }
    
    *Regression for Tenure
    clogit Tenure_ DummyReal DummyAll `JahrDummys' `AlterDummys' [iweight = weights], group(ID) vce(robust)
    Here is the output I get for this regression:
    Code:
    note: JahrDummy2015 omitted because of collinearity
    note: AlterDummy3 omitted because of collinearity
    note: multiple positive outcomes within groups encountered.
    note: 107 groups (2140 obs) dropped because of all positive or
          all negative outcomes.
    
    Iteration 0:   log pseudolikelihood =   -195.918  
    Iteration 1:   log pseudolikelihood =  -114.3456  
    Iteration 2:   log pseudolikelihood = -101.83639  
    Iteration 3:   log pseudolikelihood = -99.917826  
    Iteration 4:   log pseudolikelihood = -99.636713  
    Iteration 5:   log pseudolikelihood = -99.569896  
    Iteration 6:   log pseudolikelihood = -99.557089  
    Iteration 7:   log pseudolikelihood = -99.555031  
    Iteration 8:   log pseudolikelihood = -99.554527  
    Iteration 9:   log pseudolikelihood = -99.554423  
    Iteration 10:  log pseudolikelihood = -99.554401  
    Iteration 11:  log pseudolikelihood = -99.554396  
    
    Conditional (fixed-effects) logistic regression   Number of obs   =       1100
                                                      Wald chi2(24)   = 5892735.82
                                                      Prob > chi2     =     0.0000
    Log pseudolikelihood = -99.554396                 Pseudo R2       =     0.8206
    
                                          (Std. Err. adjusted for clustering on ID)
    -------------------------------------------------------------------------------
                  |               Robust
          Tenure_ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
        DummyReal |   15.74403   1.079024    14.59   0.000     13.62918    17.85888
         DummyAll |   .7220081   .9697854     0.74   0.457    -1.178736    2.622753
    JahrDummy1996 |  -15.39999   5.159302    -2.98   0.003    -25.51204   -5.287945
    JahrDummy1997 |  -15.64311   5.125531    -3.05   0.002    -25.68897   -5.597253
    JahrDummy1998 |  -16.00921    5.06794    -3.16   0.002    -25.94219   -6.076232
    JahrDummy1999 |  -13.98576   4.843719    -2.89   0.004    -23.47928    -4.49225
    JahrDummy2000 |  -13.19993   4.650074    -2.84   0.005    -22.31391   -4.085955
    JahrDummy2001 |   -13.3446   4.651334    -2.87   0.004    -22.46104   -4.228151
    JahrDummy2002 |  -12.63749   4.520511    -2.80   0.005    -21.49753   -3.777456
    JahrDummy2003 |   -11.1608   4.461463    -2.50   0.012    -19.90511   -2.416496
    JahrDummy2004 |  -10.61971   4.420429    -2.40   0.016     -19.2836   -1.955833
    JahrDummy2005 |  -10.47379   4.541734    -2.31   0.021    -19.37543   -1.572158
    JahrDummy2006 |    -10.729   4.448356    -2.41   0.016    -19.44762   -2.010386
    JahrDummy2007 |  -9.759413   4.186473    -2.33   0.020    -17.96475   -1.554076
    JahrDummy2008 |  -9.213294   3.922282    -2.35   0.019    -16.90083   -1.525763
    JahrDummy2009 |  -8.063055   3.616174    -2.23   0.026    -15.15063   -.9754846
    JahrDummy2010 |  -6.441235   2.870035    -2.24   0.025     -12.0664   -.8160707
    JahrDummy2011 |  -4.453233   2.259879    -1.97   0.049    -8.882514   -.0239515
    JahrDummy2012 |  -3.397512   1.727395    -1.97   0.049    -6.783143   -.0118808
    JahrDummy2013 |  -1.874335   1.419931    -1.32   0.187    -4.657349    .9086792
    JahrDummy2014 |  -.6120955   .9763561    -0.63   0.531    -2.525718    1.301527
    JahrDummy2015 |          0  (omitted)
      AlterDummy0 |    9.14958   3.342895     2.74   0.006     2.597627    15.70153
      AlterDummy1 |   11.50867   3.453237     3.33   0.001      4.74045    18.27689
      AlterDummy2 |   10.82984   3.416407     3.17   0.002     4.133804    17.52587
      AlterDummy3 |          0  (omitted)
    -------------------------------------------------------------------------------

    My Problem is that I am not sure how to interpret these results. I searched through the forum and found some excellent posts regarding the problems of using the margins command for clogit with fixed effects. I wanted to use the aextlogit command, but I get a coefficient value greater than 1 for DummyReal, which makes no sense.

    Third Question: Is there any help you can give me, to be able to interpret these results better? What else could I do or should I just stick to the interpretation, that treatment has a positive Effect on having Tenure?

    Thank you guys a lot in advance for your help. I hope I made everything clear for you. If not, sorry and I will do my best to answer as quickly as I can.

    Best regards!
Working...
X