extending a Two-part model

Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#1

extending a Two-part model

03 Apr 2025, 17:56

I posted this question but no one answered, I'm reposting it in hope that someone answers. Hello, I have an open-ended question regarding two-part modeling. I would like to implement two different two-part models. In the first model, I plan to use a logit model in the first part and a quantile regression in the second part. In the second model, I plan to use a logit model followed by a beta regression. I’m aware that the twopm command exists in Stata, but it does not support quantile or beta regression for the second part.

The goal of the first model is to examine out-of-pocket dental expenditures, while the second model focuses on out-of-pocket dental expenditures as a percentage of income. If anyone has any advice or suggestions on implementing these models, I would greatly appreciate it.
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 2993
#2

03 Apr 2025, 22:15

Dear Luis Mijares Castaneda,

I am not sure that I can help you, but why do you want to use quantiles in the first case but not in the second? Also, do you want to use a two-part model because you believe that zero and positive expenditures are driven by different processes?

Best wishes,

Joao
1 like
Comment
Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#3

04 Apr 2025, 08:04

Hello, so I want to use one Two-step model that is logit in the first equation and then Quantile regression in the second equation, yeah I have excess 0s and I want to control for this situation while using a Quantile regression
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2993
#4

04 Apr 2025, 09:08

You have excess of zeros with respect to what benchmark? Anyway, I do not think quantile regression is a way to deal with zeros.
1 like
Comment
Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#5

04 Apr 2025, 09:19

Here is the histogram of the log-transformed out of pocket dental expenditures, there are excess 0s, when I try and use the the regular Quantile regression command I get an error because the 0s are causing numerical instability so I want to use a Two-step model. the First equation a logit to model the 0s and then a quantile regression to see the out of pocket dental costs by percent. I also want a second Two-step model with the second equation being a Beta distribution, the second Two-step model will be used to model out of pocket dental costs as a percentage of income. If you know how I can program the two-step models that would be greatly appreciated.
Comment
Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#6

04 Apr 2025, 11:43

Does the above post make sense? I'm not sure if I'm writing down my problem cogeintly
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2993
#7

04 Apr 2025, 23:08

Dear Luis Mijares Castaneda,

There are several issues here.

1 - You cannot say that you have excess zeros unless you have a benchmark, which I believe you do not. What you can say is that there is a mass-point at zero, which is normal in expenditures and other corner-solutions data (see Jeff Wooldridge's textbook).

2 - You should not use quantile regression with this type of data because standard quantile regression assumes that the dependent variable is continuous.

3 - You may want to use a two-step model with this kind of data because there is a literature that argues that demand for health care depends on the decision to demand some health care and on a separate process deciding the quantity demanded, given that it is positive. That is, you may want to use a two-part model but not because you have (excess) zeros or because you have problems estimating quantiles.

4 - You did not provide any good reason for wanting to use quantile regression, so I would not do it, at least to start with. I am happy to discuss the use of QR in this context, but I will leave that for later.

5 - Given the above, I suggest you look at the user-written command twopm, which would allow you to use a logit in the first part and Poisson regression in the second part (note that I would use Poisson regression for expenditures, not log-expenditures).

6 - If the share of dental expenditures as a share of income is always well below 1 (as I expect will be the case), you can use the same approach even when you model the shares because the upper bound is irrelevant.

Best wishes,

Joao
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2119
#8

05 Apr 2025, 08:26

I agree with Joao. I don't see good motivation for using quantile regression conditional on expenditures > 0. I think using a model for log(expend) conditional on expend > 0 is fine if you insist on a two-part model. Or, you can model expend conditional on expend > 0 as a truncated normal (with P(expend = 0) modeled as probit) as in one of Cragg's two-part models estimated by tpm. You could use a fractional response model conditional on expend > 0, which would typically mean a binary logit and a fractional logit.
Comment

Luis Mijares Castaneda

Join Date: Jan 2021
Posts: 79

05 Apr 2025, 21:03

@Joao Santos Silva Thank you for your response. I have one point of clarification. I am interested in examining how the variable inc_d, which measures whether a respondent had a previous incarceration, varies by quantile of out-of-pocket dental expenditures. The way this question was asked in the survey is somewhat complicated: half of the respondents were asked about previous incarceration in Wave 1, and the other half were asked in Wave 2. To account for this, I take the log of the average out-of-pocket dental expenditures across Waves 1 and 2. The distribution of the variables is listed below. Since out-of-pocket dental expenditures are continuous, I should be able to use a quantile regression, correct?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long(oopdental_costs_wave1 oopdental_costs_wave2) float(log_avrg_cost inc_d)
 400   300  5.860786 0
   0     0         0 0
 500   800  6.478509 0
1000  4000  7.824446 0
   0  1200  6.398595 0
3001  9000  8.699764 0
 200   300  5.525453 0
   0     0         0 0
1300   300  6.685861 0
   0  1200  6.398595 0
2000  1200  7.378384 0
   0     0         0 0
   0     0         0 0
   0     0         0 0
   0     0         0 0
2000   550  7.151485 0
   0     0         0 1
   0  1000  6.216606 0
 500   300  5.993961 0
1200  5000   8.03948 0
   0     0         0 0
 250   150  5.303305 0
   0   400  5.303305 0
   0   201  4.620059 0
   0     0         0 0
2000  1000  7.313887 0
 400 14000  8.881975 0
 400   400  5.993961 0
3000  2000  7.824446 0
 500  3200  7.523481 0
1500   300  6.803505 0
   0     0         0 0
   0     0         0 0
   0     0         0 0
   0     0         0 0
1200     0  6.398595 0
   0     0         0 0
   0  2000  6.908755 0
 250   400  5.786897 0
   0   250  4.836282 0
 200   200  5.303305 0
1500   500  6.908755 0
  53     0  3.314186 0
   0  3001   7.31422 0
8000  8000  8.987322 0
   0     0         0 0
  70   200  4.912655 1
 401   300   5.86221 0
 300    75  5.239098 0
   0     0         0 0
 300  1500  6.803505 0
 400  3000  7.438972 0
   0     0         0 0
   0   200 4.6151204 0
 100     0 3.9318256 0
   0     0         0 0
 400    20  5.351858 0
   0     0         0 0
 180  1200   6.53814 0
2000  1000  7.313887 1
 600  6000  8.101981 0
 385   700   6.29803 0
 450   401  6.055613 0
   0     0         0 0
5000   500   7.91972 0
1500   500  6.908755 0
1200  1000  7.003974 0
 600  1130  6.763885 0
   0     0         0 0
   0     0         0 0
  10     0 1.7917595 0
   0     0         0 1
   0     0         0 0
   0     0         0 0
   0     0         0 0
1000   600  6.685861 0
 270   200  5.463832 0
   0     0         0 0
 201     0  4.620059 0
1300   220  6.634634 0
   0   500  5.525453 0
 500   200  5.860786 0
   0     0         0 0
1000    30  6.246107 0
   0     0         0 0
 400   180  5.673323 0
   0     0         0 0
   0     0         0 0
   0     0         0 1
   0     0         0 0
   0    60  3.433987 0
 100     0 3.9318256 1
 350     0  5.170484 0
   0     0         0 0
   0     0         0 0
2000  2000  7.601402 0
   0     0         0 0
   0     0         0 0
   0     0         0 0
   0     0         0 1
end
label values inc_d inc_d
label def inc_d 0 "No", modify
label def inc_d 1 "Yes", modify

Last edited by Luis Mijares Castaneda; 05 Apr 2025, 21:08.

Comment

Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#10

05 Apr 2025, 21:23

Jeff Wooldridge Hello, thank you for your response, I used your undergraduate econometrics textbook in my class btw. How would I write a two-stage model with the second stage being a fractional logit? the -twopm- command fits a GLM in the second equation, I'm not sure how I would program the model or if a command for a two-stage fractional regression model is available in Stata.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2993
#11

07 Apr 2025, 01:04

I do not think that you can just take the average, but I would need to know much more about the survey to be able to advise.
Comment
Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#12

07 Apr 2025, 12:53

Joao Santos Silva The data come from the Health and Retirement Study (HRS) conducted by the University of Michigan. Two questions asked participants whether they had previously been incarcerated; these questions were administered in 2012 and 2014, with half of the sampled participants asked in 2012 and the other half in 2014. Attached below is the website with the corresponding incarceration questions. Total out-of-pocket dental expenditures were collected in all five available waves, with participants reporting their total dental spending for each year. Attached is the distribution of the relevant dental expenditure variable from 2012. I averaged the expenditures from 2012 and 2014 because incarceration status was not collected for the entire sample in a single year. If there is a better method to incorporate the data, I would greatly appreciate your advice.

https://hrs.isr.umich.edu/documentat...on-concordance

Questions regarding incarceration: nlb035_b olb033_b
Questions regarding out of pocket dental expenditures: nn168 nn169

Last edited by Luis Mijares Castaneda; 07 Apr 2025, 12:56.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2993
#13

08 Apr 2025, 06:42

I am sorry, cannot find the exact questions.
Comment
Luis Mijares Castaneda

Join Date: Jan 2021

Posts: 79
#14

08 Apr 2025, 12:19

Joao Santos Silva Hello, attached below should be the questions as appear in HRS's website, nlb035_b asks about incarceration in wave 1 olb033_b asks about incarceration in wave 2, and nn168, nn169 ask about out of pocket dental expenditures

Attached Files

Last edited by Luis Mijares Castaneda; 08 Apr 2025, 12:24.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2993
#15

09 Apr 2025, 09:50

Dear Luis Mijares Castaneda,

My fear is that you cannot use this as a balanced panel. Suppose that someone is asked the incarceration question in the first wave and they say they have never been incarcerated. In the second wave the question is not asked, and therefore you do not know if the person was incarcerated between the first and the second wave. That is, in the second wave you do not know the value of the incarceration variable for those who in the first have reported never to have been incarcerated. For those asked the question in the second wave, you have the reverse problem (you only know their status in the first wave if they say they were never incarcerated). Am I missing something? Anyway, others may have better ideas.

Best wishes,

Joao
Comment

Announcement