Logit time-varying covariate survival model. Do I adjust the standard errors from the Stata logit command output?

Michael Kendix

Join Date: Dec 2015

Posts: 4
#1

Logit time-varying covariate survival model. Do I adjust the standard errors from the Stata logit command output?

14 Dec 2015, 13:44

Hi:

I am new to this forum so I apologize in advance for any naivety or lack of understanding of rules and regulations. I am also new to survival/hazard rate modeling though not to econometrics in general.

My problem is as follows. I am analyzing a survival model that examines the issue of how many months newly hired employees remain before they quit. I am using a logistic regression model. There are approximately 14,000 employees hired at different times during the study period (about 13 years). I have some non-time-varying covariates (NTVC) that are fixed; e.g., age at hire, gender, etc. and some that vary TVC); e.g., wages. The data are arranged such that there is a single observation for each time period (month) that the employee is “At risk” so we have a total of approximately 700,000 observations (employee-months). Each employee hired appears in the data once. The model looks like this:

Logit(Y(it))=A*TVC(it) + B*NTVC(it) + C*month1_monthT,

Where Y is 1 for quitting and zero elsewhere; A, B and C are parameter vectors of appropriate dimension; month1_monthT is a set of dummy variables for each possible value of t; i and t denote the individual and the number of months the individual has been at risk so far, respectively, so if he/she is in their 4th month of employment since being hired, t=4, and so on. The Stata code is simply something like this:

logit Y m1-m144 TVC NTVC.

My question is whether their needs to be an adjustment of the standard errors produced by the standard logit command. According to Paul Allison’s book “Survival Analysis Using SAS” pp. 246-7 and Allison (1982), "Discrete-Time Methods for the Analysis of Event Histories", Sociological Methodology, Vol. 13 (1982) pp. 61-98, see:

http://statisticalhorizons.com/wp-co...lison.SM82.pdf,

there is no need to adjust the standard errors.

However, according to Tyler Shumway, “Forecasting Bankruptcy More Accurately: A Simple Hazard Model”, The Journal of Business, Vol. 74, No. 1 (January 2001), pp. 101-124:

http://www.rcg.ch/resources/Forecast...ard_Model1.pdf,

it is necessary to adjust the degrees of freedom to approximately equal the number employees in the model (14,000) rather than the number of employee months (700,000). This is to be done by dividing the 700,000 by the average number of employee months (in my case that is about 51).

To adjust or not-adjust the standard errors output from the logit command? That is the question. Do I adjust the Stata output’s standard errors (essentially multiplying them by the square root of 51, in my example) or just leave them as they are?

Thanks,

Michael
Tags: None
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#2

15 Dec 2015, 02:08

Leave your standard errors as they are: Allison is correct. Be aware that having the data in employee-month form and applying a logit model to the data in that form is simply a "trick" that enables easy estimation of this simple discrete time survival analysis model. The correct likelihood function is being maximised. (See my "Easy estimation methods for discrete time hazard models", Oxford Bulletin of Economics and Statistics, 1995, or my "Survival Analysis" website materials)
Comment
Michael Kendix

Join Date: Dec 2015

Posts: 4
#3

15 Dec 2015, 07:47

Originally posted by Stephen Jenkins View Post

Leave your standard errors as they are: Allison is correct. Be aware that having the data in employee-month form and applying a logit model to the data in that form is simply a "trick" that enables easy estimation of this simple discrete time survival analysis model. The correct likelihood function is being maximised. (See my "Easy estimation methods for discrete time hazard models", Oxford Bulletin of Economics and Statistics, 1995, or my "Survival Analysis" website materials)

Many thanks for your prompt and helpful response. I will obtain a copy of your paper.
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

15 Dec 2015, 15:56

Here's an empirical demonstration that Shumway's claims are unfounded. The code is copied from an example in Stephen's Lesson 6, linked to on his Survival Analysis page. The data set has 48 people and is expanded to 744. I show the likelihood-based logit analysis and one in which standard errors are based on the 48 independent ID clusters. If Shumway were correct, the likelihood-based standard errors should be much smaller than the cluster-based standard errors. In fact, they are similar in magnitude.

Code:

sysuse cancer, clear

ge id = _n
lab var id "subject identifier"
 recode drug 1=0 2/3=1

lab var drug "receives drug?"
lab def drug 0 "placebo" 1 "drug"
lab val drug drug

expand studytim

bysort id: ge j = _n
* spell month identifier, by subject
lab var j "spell month"
bysort id: ge dead = died==1 & _n==_N
lab var dead "binary depvar for discrete hazard model"

ta j, ge(d)

ge dur1 = d1+d2+d3+d4+d5+d6
ge dur2 = d7+d8+d9+d10+d11+d12
ge dur3 = d13+d14+d15+d16+d17+d18
ge dur4 = d19+d20+d21+d22+d23+d24
ge dur5 = d25+d26+d27+d28+d29+d30
ge dur6 = d31+d32+d33+d34+d35+d36+d37+d38+d39

/* Likelihood-based standard errors */
logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog
/* cluster on id */
logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog vce(cluster id)

The results of the logit analyses are:

Code:

.
. /* Likelihood-based standard errors */
. logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog

Logistic regression                             Number of obs     =        744
                                                Wald chi2(8)      =     221.44
Log likelihood = -111.49006                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
        dead |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |  -2.280694   .4565036    -5.00   0.000    -3.175425   -1.385963
         age |   .1190845   .0389076     3.06   0.002      .042827     .195342
        dur1 |  -9.098389   2.265066    -4.02   0.000    -13.53784   -4.658941
        dur2 |  -8.434023   2.178112    -3.87   0.000    -12.70304   -4.165002
        dur3 |  -8.438919   2.182175    -3.87   0.000     -12.7159   -4.161936
        dur4 |  -7.596855   2.169892    -3.50   0.000    -11.84977   -3.343945
        dur5 |  -7.445229   2.273184    -3.28   0.001    -11.90059    -2.98987
        dur6 |  -7.499636   2.382459    -3.15   0.002    -12.16917   -2.830103
------------------------------------------------------------------------------

. /* cluster on id */
. logit dead drug age dur1 dur2 dur3 dur4 dur5 dur6, nocons nolog vce(cluster id)

Logistic regression                             Number of obs     =        744
                                                Wald chi2(8)      =     259.03
Log pseudolikelihood = -111.49006               Prob > chi2       =     0.0000

                                    (Std. Err. adjusted for 48 clusters in id)
------------------------------------------------------------------------------
             |               Robust
        dead |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |  -2.280694   .4506122    -5.06   0.000    -3.163878    -1.39751
         age |   .1190845   .0338969     3.51   0.000     .0526478    .1855212
        dur1 |  -9.098389   1.902411    -4.78   0.000    -12.82705   -5.369731
        dur2 |  -8.434023   1.899325    -4.44   0.000    -12.15663   -4.711414
        dur3 |  -8.438919    1.77671    -4.75   0.000    -11.92121   -4.956632
        dur4 |  -7.596855   1.861652    -4.08   0.000    -11.24563   -3.948084
        dur5 |  -7.445229   1.924729    -3.87   0.000    -11.21763   -3.672831
        dur6 |  -7.499636   2.033321    -3.69   0.000    -11.48487     -3.5144
------------------------------------------------------------------------------

In future posts, be sure to follow the FAQ, the most important of which is FAQ 12.

Last edited by Steve Samuels; 15 Dec 2015, 16:02.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Announcement

Logit time-varying covariate survival model. Do I adjust the standard errors from the Stata logit command output?

Comment

Comment

Comment