Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit time-varying covariate survival model. Do I adjust the standard errors from the Stata logit command output?

    Hi:

    I am new to this forum so I apologize in advance for any naivety or lack of understanding of rules and regulations. I am also new to survival/hazard rate modeling though not to econometrics in general.

    My problem is as follows. I am analyzing a survival model that examines the issue of how many months newly hired employees remain before they quit. I am using a logistic regression model. There are approximately 14,000 employees hired at different times during the study period (about 13 years). I have some non-time-varying covariates (NTVC) that are fixed; e.g., age at hire, gender, etc. and some that vary TVC); e.g., wages. The data are arranged such that there is a single observation for each time period (month) that the employee is “At risk” so we have a total of approximately 700,000 observations (employee-months). Each employee hired appears in the data once. The model looks like this:

    Logit(Y(it))=A*TVC(it) + B*NTVC(it) + C*month1_monthT,

    Where Y is 1 for quitting and zero elsewhere; A, B and C are parameter vectors of appropriate dimension; month1_monthT is a set of dummy variables for each possible value of t; i and t denote the individual and the number of months the individual has been at risk so far, respectively, so if he/she is in their 4th month of employment since being hired, t=4, and so on. The Stata code is simply something like this:

    logit Y m1-m144 TVC NTVC.

    My question is whether their needs to be an adjustment of the standard errors produced by the standard logit command. According to Paul Allison’s book “Survival Analysis Using SAS” pp. 246-7 and Allison (1982), "Discrete-Time Methods for the Analysis of Event Histories", Sociological Methodology, Vol. 13 (1982) pp. 61-98, see:

    http://statisticalhorizons.com/wp-co...lison.SM82.pdf,

    there is no need to adjust the standard errors.

    However, according to Tyler Shumway, “Forecasting Bankruptcy More Accurately: A Simple Hazard Model”, The Journal of Business, Vol. 74, No. 1 (January 2001), pp. 101-124:

    http://www.rcg.ch/resources/Forecast...ard_Model1.pdf,

    it is necessary to adjust the degrees of freedom to approximately equal the number employees in the model (14,000) rather than the number of employee months (700,000). This is to be done by dividing the 700,000 by the average number of employee months (in my case that is about 51).

    To adjust or not-adjust the standard errors output from the logit command? That is the question. Do I adjust the Stata output’s standard errors (essentially multiplying them by the square root of 51, in my example) or just leave them as they are?

    Thanks,

    Michael

  • #2
    Leave your standard errors as they are: Allison is correct. Be aware that having the data in employee-month form and applying a logit model to the data in that form is simply a "trick" that enables easy estimation of this simple discrete time survival analysis model. The correct likelihood function is being maximised. (See my "Easy estimation methods for discrete time hazard models", Oxford Bulletin of Economics and Statistics, 1995, or my "Survival Analysis" website materials)

    Comment


    • #3
      Originally posted by Stephen Jenkins View Post
      Leave your standard errors as they are: Allison is correct. Be aware that having the data in employee-month form and applying a logit model to the data in that form is simply a "trick" that enables easy estimation of this simple discrete time survival analysis model. The correct likelihood function is being maximised. (See my "Easy estimation methods for discrete time hazard models", Oxford Bulletin of Economics and Statistics, 1995, or my "Survival Analysis" website materials)
      Many thanks for your prompt and helpful response. I will obtain a copy of your paper.

      Comment


      • #4
        Here's an empirical demonstration that Shumway's claims are unfounded. The code is copied from an example in Stephen's Lesson 6, linked to on his Survival Analysis page. The data set has 48 people and is expanded to 744. I show the likelihood-based logit analysis and one in which standard errors are based on the 48 independent ID clusters. If Shumway were correct, the likelihood-based standard errors should be much smaller than the cluster-based standard errors. In fact, they are similar in magnitude.

        Code:
        sysuse cancer, clear
        
        ge id = _n
        lab var id "subject identifier"
         recode drug 1=0 2/3=1
        
        lab var drug "receives drug?"
        lab def drug 0 "placebo" 1 "drug"
        lab val drug drug
        
        expand studytim
        
        bysort id: ge j = _n
        * spell month identifier, by subject
        lab var j "spell month"
        bysort id: ge dead = died==1 & _n==_N
        lab var dead "binary depvar for discrete hazard model"
        
        ta j, ge(d)
        
        ge dur1 = d1+d2+d3+d4+d5+d6
        ge dur2 = d7+d8+d9+d10+d11+d12
        ge dur3 = d13+d14+d15+d16+d17+d18
        ge dur4 = d19+d20+d21+d22+d23+d24
        ge dur5 = d25+d26+d27+d28+d29+d30
        ge dur6 = d31+d32+d33+d34+d35+d36+d37+d38+d39
        
        /* Likelihood-based standard errors */
        logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog
        /* cluster on id */
        logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog vce(cluster id)
        The results of the logit analyses are:
        Code:
        .
        . /* Likelihood-based standard errors */
        . logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog
        
        Logistic regression                             Number of obs     =        744
                                                        Wald chi2(8)      =     221.44
        Log likelihood = -111.49006                     Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
                dead |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                drug |  -2.280694   .4565036    -5.00   0.000    -3.175425   -1.385963
                 age |   .1190845   .0389076     3.06   0.002      .042827     .195342
                dur1 |  -9.098389   2.265066    -4.02   0.000    -13.53784   -4.658941
                dur2 |  -8.434023   2.178112    -3.87   0.000    -12.70304   -4.165002
                dur3 |  -8.438919   2.182175    -3.87   0.000     -12.7159   -4.161936
                dur4 |  -7.596855   2.169892    -3.50   0.000    -11.84977   -3.343945
                dur5 |  -7.445229   2.273184    -3.28   0.001    -11.90059    -2.98987
                dur6 |  -7.499636   2.382459    -3.15   0.002    -12.16917   -2.830103
        ------------------------------------------------------------------------------
        
        . /* cluster on id */
        . logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog vce(cluster id)
        
        Logistic regression                             Number of obs     =        744
                                                        Wald chi2(8)      =     259.03
        Log pseudolikelihood = -111.49006               Prob > chi2       =     0.0000
        
                                            (Std. Err. adjusted for 48 clusters in id)
        ------------------------------------------------------------------------------
                     |               Robust
                dead |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                drug |  -2.280694   .4506122    -5.06   0.000    -3.163878    -1.39751
                 age |   .1190845   .0338969     3.51   0.000     .0526478    .1855212
                dur1 |  -9.098389   1.902411    -4.78   0.000    -12.82705   -5.369731
                dur2 |  -8.434023   1.899325    -4.44   0.000    -12.15663   -4.711414
                dur3 |  -8.438919    1.77671    -4.75   0.000    -11.92121   -4.956632
                dur4 |  -7.596855   1.861652    -4.08   0.000    -11.24563   -3.948084
                dur5 |  -7.445229   1.924729    -3.87   0.000    -11.21763   -3.672831
                dur6 |  -7.499636   2.033321    -3.69   0.000    -11.48487     -3.5144
        ------------------------------------------------------------------------------
        In future posts, be sure to follow the FAQ, the most important of which is FAQ 12.
        Last edited by Steve Samuels; 15 Dec 2015, 16:02.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment

        Working...
        X