Initial condition/sample selection problem in duration model?

Anny Yu

Join Date: Dec 2017

Posts: 17
#1

Initial condition/sample selection problem in duration model?

24 Feb 2020, 09:13

Hi everyone, I have some general questions about duration model and its implementation on Stata.

(1) Duration model looks at a subgroup of the population. For example, if we study the duration of unemployment, we are looking at only people who were unemployed during the years of observation. Why is it seldom mentioned about sample selection problem in economics papers that use the model, or is it something that is not concerning at all? This is in contrast to the studies that use a Heckman model to correct for initial condition problem.

(2) If it is a problem, are there ways to implement on Stata discrete-time duration model that takes care of both unobserved heterogeneity and selection bias?

Many, many thanks indeed for your help!
Tags: None
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#2

24 Feb 2020, 10:20

These are good questions, but typically hard to address.

(1) Yes, virtually all models of survival/duration analysis are "single spell" models. What is modelled is the time to event conditional on entry. Note the "conditional" -- selection into the initial state is not examined. To take account of 'selection' as you phrase it, you need to jointly model entries to a state and exits from a state. This becomes complicated as soon you allow for unobserved heterogeneity ('frailty') because it is reasonable to expect unobserved factors that raise entry hazard rates to be correlated with the unobserved factors that raise exit hazard rates (controlling for observed individual characteristics). And there are 2 approaches to unobserved heterogeneity -- a 'continuous' one (e.g. assume entry and exit hazard rates are bivariate normal) or 'discrete' (e.g. assume entry and exit hazard rates have a bivariate discrete mass point distribution, akin to a form of latent class model). And of course one may have continuous survival time data or discrete/interval censored survival time data. Programming estimators for these joint models is non-trivial, and they are often context-specific.

(2) See answer above.
1 like
Comment
Anny Yu

Join Date: Dec 2017

Posts: 17
#3

27 Feb 2020, 03:14

Originally posted by Stephen Jenkins View Post

These are good questions, but typically hard to address.

(1) Yes, virtually all models of survival/duration analysis are "single spell" models. What is modelled is the time to event conditional on entry. Note the "conditional" -- selection into the initial state is not examined. To take account of 'selection' as you phrase it, you need to jointly model entries to a state and exits from a state. This becomes complicated as soon you allow for unobserved heterogeneity ('frailty') because it is reasonable to expect unobserved factors that raise entry hazard rates to be correlated with the unobserved factors that raise exit hazard rates (controlling for observed individual characteristics). And there are 2 approaches to unobserved heterogeneity -- a 'continuous' one (e.g. assume entry and exit hazard rates are bivariate normal) or 'discrete' (e.g. assume entry and exit hazard rates have a bivariate discrete mass point distribution, akin to a form of latent class model). And of course one may have continuous survival time data or discrete/interval censored survival time data. Programming estimators for these joint models is non-trivial, and they are often context-specific.

(2) See answer above.

Thank you very much Stephen for your answer!

I have seen some of your papers on poverty dynamics using a Markov transition model accounting for initial condition, sample selection and unobserved heterogeneity which are very helpful and informative. I have a basic question regarding such estimation:

Is initial condition correction using correlated random effects only an option in dynamic model when initial condition can be either 0/1 for individuals in the sample, but not in the case here if all individuals in the sample have an initial condition of 1 (e.g. poor)? And thus an instrument is needed in this case?

Last edited by Anny Yu; 27 Feb 2020, 03:35.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#4

27 Feb 2020, 06:54

Is initial condition correction using correlated random effects only an option in dynamic model when initial condition can be either 0/1 for individuals in the sample, but not in the case here if all individuals in the sample have an initial condition of 1 (e.g. poor)?

Sorry, but your question is not very clear to me. If everyone in your sample is poor when first observed, I don't see how you could model the probability of being poor when first observed. (It's akin to trying to run an ordinary probit regression for which all the observed outcomes == 1 and there are no zeros.)
Comment
Anny Yu

Join Date: Dec 2017

Posts: 17
#5

27 Feb 2020, 13:24

Originally posted by Stephen Jenkins View Post

Sorry, but your question is not very clear to me. If everyone in your sample is poor when first observed, I don't see how you could model the probability of being poor when first observed. (It's akin to trying to run an ordinary probit regression for which all the observed outcomes == 1 and there are no zeros.)

But like in your paper with Cappellari in 2008, you mentioned that the sample contains only employees so everyone's initial status of employment is 1, because the focus is on predicting persistence in low pay. And you apply an instrument to correct for this. Is it not the case? So did you not only have employees in the baseline (initial period) but also non-employees?

Last edited by Anny Yu; 27 Feb 2020, 13:47.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#6

27 Feb 2020, 14:30

I presume you are referring to: ‘Estimating low pay transition probabilities accounting for endogenous selection mechanisms’, Journal of the Royal Statistical Society, Series C (Applied Statistics) 57 (2), 2008, 165–186. (Lorenzo Cappellari; Stephen P. Jenkins). Please give full citations, as recommended by the Forum FAQ. (I had to look up what you meant; I doubt other readers had any clue at all!) That paper considers employees, but the paper is primarily about modelling transitions between low-pay and high-pay. It is not about who is an employee.

Summary. We propose a model of transitions into and out of low paid employment that accounts for non-ignorable panel dropout, employment retention and base year low pay status (‘initial conditions’).The model is fitted to data for men from the British Household Panel Survey. Initial conditions and employment retention are found to be non-ignorable selection processes. Whether panel dropout is found to be ignorable depends on how item non-response on pay is treated. Notwithstanding these results, we also find that models incorporating a simpler approach to accounting for non-ignorable selections provide estimates of covariate effects that differ very little from the estimates from the general model.

The modelling of initial conditions (and which involved the use of instruments) is not to do with whether someone is an employee; it's to do with employees' pay status.
1 like
Comment
Anny Yu

Join Date: Dec 2017

Posts: 17
#7

27 Feb 2020, 16:21

Originally posted by Stephen Jenkins View Post

I presume you are referring to: ‘Estimating low pay transition probabilities accounting for endogenous selection mechanisms’, Journal of the Royal Statistical Society, Series C (Applied Statistics) 57 (2), 2008, 165–186. (Lorenzo Cappellari; Stephen P. Jenkins). Please give full citations, as recommended by the Forum FAQ. (I had to look up what you meant; I doubt other readers had any clue at all!) That paper considers employees, but the paper is primarily about modelling transitions between low-pay and high-pay. It is not about who is an employee.

The modelling of initial conditions (and which involved the use of instruments) is not to do with whether someone is an employee; it's to do with employees' pay status.

Very sorry I'll pay attention to that.

And sorry for the wrong phrasing - I understand that it is whether someone's wage is observed in the base year that will affect their selection into the sample. So the sample includes ONLY individuals whose wage was observed in the base year (t=1). And I just realized that perhaps I wasn't understanding your paper correctly before - your correction targets entry into low wage status at t-1, but not whether someone is selected into the sample based on whether their wage was observed in the base year (t=1).

My initial question in this post is whether, if the sample selection relies completely on fulfilling certain criteria e.g. being unemployed in base year (and in the case of your paper, someone's wage being observed), such base year status (= sample selection condition) is something that needs to be corrected for at all in these transition & survival models? For example, if I want to observe persistence in unemployment, my sample selection criteria would be unemployed persons in 2008 because it is the first year with available data, but would it create bias for estimating such transitions e.g. because 2008 is a year hit hard by recession? Or is it something that needs no concern?

Many thanks indeed!
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#8

27 Feb 2020, 16:30

You're tying yourself in unnecessary knots, I think. You can derive interesting results about unemployment duration for the people who enter a spell of unemployment -- as many others have. Unless you have full (un)employment histories -- with transitions into and out of employment -- you cannot also model the 'selection' into the sample of people starting a spell of unemployment. (I'm repeating myself from an earlier post.)
Yes, maybe the people who start a unemployment spell in a recession period have longer spells than those who start a spell in a boom period. Sometimes researchers have samples that include persons who start spells at different times and look at aspects of this issue.
If I were you, I'd stop fetishing about "bias", especially because you haven't discussed at length what the "population" is that you wish to generalise to.
Sorry, but I'm done with this topic now.
1 like
Comment

Announcement

Initial condition/sample selection problem in duration model?

Comment

Comment

Comment

Comment

Comment

Comment

Comment