Censored Regression for GLM

Doug DeMoulin

Join Date: May 2021

Posts: 12
#1

Censored Regression for GLM

18 Feb 2022, 17:45

Hi All,

Is it possible to incorporate a command to allow censoring from below for GLM models?

For Example:

glm depvar indvar covariates, family(nbinomial ml) link(identity) [any command for censoring]
Tags: None

Andrea Discacciati

Join Date: Feb 2016
Posts: 194

19 Feb 2022, 04:51

Not that I'm aware of. But you can program it yourself!
Here's a simple example with a left-censored Poisson distribution.
Observations are left-censored if y<=4 and uncensored otherwise.
I've used the -mlexp- command (help mlexp) which in this simple case is quite handy.

Code:

cap program drop censPoi
program censPoi, rclass
clear
set obs 1000
gen y = rpoisson(5)

gen yc = y
replace yc = 4 if y<= 4
gen d = y > 4

glm y, family(poisson) // original uncensored y
local m1 = exp(_b[_cons])

glm yc, family(poisson) // censored yc - wrong!
local m2 = exp(_b[_cons])

mlexp (d*log(poissonp(exp({theta}), yc)) + (1-d)*log(poisson(exp({theta}), yc))) // censored y - OK
local m3 = exp(_b[/theta])

return scalar m1 = `m1'
return scalar m2 = `m2'
return scalar m3 = `m3'
end


simulate m1 = r(m1) m2 = r(m2) m3 = r(m3), reps(200) seed(987): censPoi
su m*

Code:

. su m*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          m1 |        200    4.998685    .0701953       4.83      5.133
          m2 |        200    5.436375    .0547442      5.297      5.545
          m3 |        200    4.997207    .0742204   4.813146   5.140038

Last edited by Andrea Discacciati; 19 Feb 2022, 04:55.

Comment

Doug DeMoulin

Join Date: May 2021

Posts: 12
#3

21 Feb 2022, 09:05

thank you for your response. I am trying to conceptualize this as the censored models include those observations being censored but censors at the specified value. I am looking at the observations set at 1000 and ourput of 200. Is this a truncated command? Because truncated regressions omits those observations set a the specified ranges
Comment

Andrea Discacciati

Join Date: Feb 2016
Posts: 194

22 Feb 2022, 01:00

Code:

mlexp (d*log(poissonp(exp({theta}), yc)) + (1-d)*log(poisson(exp({theta}), yc)))

This command fits a simple Poisson model whose parameter lambda is assumed to be constant for all observations (no covariates) = exp(theta) (so that it's constrained to be >0)
It does not exclude left-censored observations (see the output below).
The uncensored observations (d==1) contribute to the log-likelihood with the (Poisson) log-probability distribution function evaluated at yc [d*log(poissonp(exp({theta}), yc)], while the left-censored observations (d==0) contribute to the likelihood with the (Poisson) log-cumulative density function evaluated at yc [(1-d)*log(poisson(exp({theta}), yc)].
With minimal effort, you can make the parameter lambda to depend on covariates (like gender, age, etc) like in a "regular" Poisson regression (glm), see the Examples at the bottom of -help mlexp-.

In #2, I ran a simple simulation to show you that, given that specific data generating mechanism, the model coded with mlexp estimated on average (over 200 simulations) the true value of the parameter lambda (5).

Code:

clear
set seed 1
set obs 1000
gen y = rpoisson(5)

gen yc = y
replace yc = 4 if y<= 4
gen d = y > 4

mlexp (d*log(poissonp(exp({theta}), yc)) + (1-d)*log(poisson(exp({theta}), yc)))
lincom /theta, eform

Code:

initial:       log likelihood = -4968.3986
alternative:   log likelihood = -3502.8649
rescale:       log likelihood = -2029.4556
Iteration 0:   log likelihood = -2029.4556  
Iteration 1:   log likelihood = -1636.3881  
Iteration 2:   log likelihood =  -1633.891  
Iteration 3:   log likelihood = -1633.8901  
Iteration 4:   log likelihood = -1633.8901  

Maximum likelihood estimation

Log likelihood = -1633.8901                     Number of obs     =      1,000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      /theta |   1.614441   .0148087   109.02   0.000     1.585416    1.643465
------------------------------------------------------------------------------

. lincom /theta, eform

 ( 1)  [theta]_cons = 0

------------------------------------------------------------------------------
             |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   5.025076   .0744147   109.02   0.000     4.881322    5.173064
------------------------------------------------------------------------------

Last edited by Andrea Discacciati; 22 Feb 2022, 01:09.

Comment

Doug DeMoulin

Join Date: May 2021

Posts: 12
#5

08 Mar 2022, 23:07

is this code applicable for identity function instead of on a log-based scale?
Comment

Andrea Discacciati

Join Date: Feb 2016
Posts: 194

10 Mar 2022, 05:31

With the necessary modifications.
Moving away form this toy example, the usual caveats about using an identity link to model a parameter constrained to be >0 apply.

Code:

clear
set seed 12
set obs 1000
gen x = rbinomial(1, 0.5)
gen y = rpoisson(1 + 2*x) // true means: 1 and 3

gen yc = y
replace yc = 4 if y<= 4
gen d = y > 4

glm y i.x, family(poisson) link(identity) // completely observed y
margins i.x

glm yc i.x, family(poisson) link(identity) // left-censored y -- wrong model!
margins i.x

mlexp (d*log(poissonp({theta: i.x _cons}, yc)) + (1-d)*log(poisson({theta:}, yc))) // left-censored y -- correct model
margins i.x

Announcement