Fixed effects corner solution response model (Tobit)

Lieke Holt

Join Date: Jun 2021

Posts: 12
#1

Fixed effects corner solution response model (Tobit)

16 Jun 2021, 04:19

Dear Statalist community,

I would like to request your help for implementation of a specific econometric model that I want to apply to my data.

I am investigating charitable donation responsiveness to changes in transitory income and available (non-working) hours. For this I exploit within-individual variation from an unbalanced short panel. I will have about 500-700 usable individual observations.
Donations are a corner solution response because they are restricted to be non-negative, leading to many observations of 0 donations.

Based on chapter 17.8 of Wooldridge (2010) I have established that the unobserved effects Tobit model would be most suited out of all panel data model options for corner solution responses.
I feel like I still need two stages (two-part model) to split the participation decision (donating a positive or 0 amount) from the actual donation amount.
Would any of you be able to tell me how to implement this type of model in Stata? Standard commands for panel data Tobit models only function for random effects.

In addition to donating, I will also be looking at volunteering responsiveness to changes transitory income and available (non-working) hours.
Because of the possible relation between this regression and the donation one, I was thinking of combining them by means of a Seemingly Unrelated Regressions model.
Does any of you have experience with and suggestions for using this model in combination with my above described model, which already seems complex enough on its own?

I look forward to hearing from you!

Best wishes,

Lieke
Tags: None

1 like
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#2

16 Jun 2021, 04:50

Dear Lieke Holt,

Jeff Wooldridge often contributes to this forum, so he may be able to help you. Anyway, for what is worth, here is my view on the problem.

For corner-solutions data, I would start with Poisson regression, especially if you have a panel and want to use fixed effects. Jeff Wooldridge has a paper showing that the estimator is valid under very general conditions and I view it as the workhorse for corner-solutions data. If you want to do a two-part model, you can use a logit for the first part (with FE) and then again Poisson with FE for the positives. If you want a reference to justify using Poisson regression in this context, please see here.

Best wishes,

Joao
Comment
Lieke Holt

Join Date: Jun 2021

Posts: 12
#3

17 Jun 2021, 02:57

Dear Joao,

Thank you very much for your suggestion! I will give that a try.

Would this be possible to execute with the twopm command? I am only aware of the OLS and GLS options for the second stage.

Kind regards,

Lieke
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#4

17 Jun 2021, 04:35

Dear Lieke Holt,

You can estimate such model with the options firstpart(logit) secondpart(glm, family(poisson) link(log)), but I do not think the command allows fixed effects (but I may be wrong).

Best wishes,

Joao
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

17 Jun 2021, 12:58

Hi Lieke: First, I agree with Joao that you should first use Poisson fixed effects with vce(robust), as this gives the most robust estimates of the conditional mean. Everything after that imposes assumptions. For example, if you want to use a twopm -- and I think it's fine to try it -- you should use the correlated random effects approach. I'm assuming your T is relatively small so that putting in individual dummies into a logit, say, is not a good idea.

But you can easily use twopm with the CRE approach. It's a bit tricky with unbalanced panels because when you include the within-unit averages of the x(i,t), you should only average over the time periods with a complete set of data on y(i,t) and x(i,t). See my 2019 Journal of Econometrics paper for more details, or show your code here after you try. If you have, say, 4 years for person one with no missing data, 6 years for person 2, and so on, then you just average as usual. I recommend also including a set of dummies indicating how many total years you have. (This acts as a control for sample selection.)

Assuming no lines with missing data -- that is, you drop any time period with data missing on either y(i,t) or any x(i,t), the following should work. It assumes "id" is the unit identifier. Hopefully I have the syntax right:

Code:

egen tobs = sum(1), by(id) egen x1bar = mean(x1), by(id) ... egen xkbar = mean(xk), by(id) twopm y x1 ... xk i.year i.tobs x1bar ... xkbar, first(logit) second(glm, fam(poisson) link(log) vce(cluster id)

You could even add i.tobs#c.x1bar, ..., i.tobs#c.xkbar for extra flexibility, but this will use up a lot of degrees-of-freedom.

I hope this helps.
JW
1 like
Comment
Lieke Holt

Join Date: Jun 2021

Posts: 12
#6

18 Jun 2021, 07:01

Dear Joao Santos Silva and Jeff Wooldridge,

Your comments have been extremely helpful, thank you very much!

I will compare both the Poisson fixed effects and the two-part model. The CRE approach for the latter is a great suggestion and indeed takes away concerns about biased estimates from using individual dummies.

The code that you have provided does the job very well (also great to be able to control for sample selection like this). To match my theoretical model and test its hypotheses I would like to slightly change the second part of the two-part model from the first by making one of the regressors only enter in interaction with the alternative outcome variable.

Code:

twopm (lead_don = income hours i.year i.tobs meanincome meanhours)(lead_don = income 1.lead_vol_bin#c.pwhrs i.year i.tobs meanincome 1.lead_vol_bin#c.meanhours), first(logit) second(glm, fam(poisson) link(log)) vce(cluster id) twopm (lead_vol = income hours i.year i.tobs meanincome meanhours)(lead_vol = 1.lead_don_bin#c.income pwhrs i.year i.tobs 1.lead_don_bin#c.meanincome meanhours), first(logit) second(glm, fam(poisson) link(log)) vce(cluster id)

where lead_vol denotes my donation outcome variable and lead_vol my volunteering one. income refers to transitory income and hours to paid working hours (so inversely relating to available time). Lead_don_bin and lead_vol_bin are the interaction terms in the second part that capture whether a respondent engages in donating or volunteering at time t (binary version of main outcome variables). The results that I am getting out of this seem plausible, but I would just like to check with you if this is a valid alteration.

Lastly, I am still wondering whether I need to use some SUR procedure to relax the constraint that the correlation coefficient between the error terms of the two regression equations equals zero. Would you suggest me to try to take this into account?

Kind regards,

Lieke
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#7

21 Jun 2021, 05:55

Lieke: The model that allows correlation between the unobservables in the two parts is not well identified without an exclusion restriction. The model becomes statistically very similar to a Heckman selection model, and you need a variable that affects participation but not the outcome when it is positive. It's usually hard to justify.
Comment
Lieke Holt

Join Date: Jun 2021

Posts: 12
#8

23 Jun 2021, 04:00

Thank you Jeff Wooldridge! I just have one final question for now:

I was having another look at the code you proposed and which I eventually used (see my last post in this thread) and it left me wondering whether the correlated random effects are properly estimated like this.
Usually when using this Chamberlain-Mundlak device of adding demeaned explanatory variables to the regression equation, the model is still estimated using random effects right? For example for Poisson:

Code:

xtpoisson y x1 ... xk x1bar ... xkbar, re vce(cluster id)

My code for the two-part model now does not include this random effects estimation component for either of the two parts. Some manual checks (e.g. just estimating the second part of the two-part model separately with CRE using xtpoisson ..., re and comparing to second-part results of twopm) that I did suggest that this leads to very different results. Am I right that the twopm command does not accommodate for this even though I theoretically should, given panel format of my data?

Best wishes,

Lieke
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#9

23 Jun 2021, 10:21

Lieke: The random effects Poisson estimator, unlike the pooled Poisson and fixed effects Poisson, has no (theoretical) robustness properties. Consistency requires the full set of assumptions: the Poisson distribution and independence across time. Both are too strong. Now, as a practical matter it may give similar estimates as FE Poisson when combined with the CRE approach. But there's no expectation that it will. I have a preference for pooled methods because they allow any kind of serial correlation across time. That's why I suggested twopm with the CRE device and clustered standard errors.

How different are the results from pooled Poisson and RE Poisson in the second part?
Comment
Lieke Holt

Join Date: Jun 2021

Posts: 12
#10

23 Jun 2021, 13:32

Dear Jeff Wooldridge,

That is very good to know, I was not aware of that.

By now I have also noticed that the differences between estimates of reg and xtreg, re are much smaller than differences between estimates from the different Poisson types.
I was very puzzled at first because the use of xtpoisson, re even made my coefficients change sign. I now see that this change of sign is even different from the non-Poisson estimations, so clearly the random effects Poisson is the odd one out here.
That being said, I wonder how careful I should be with interpreting my coefficients based on the pooled Poisson, because with the linear panel data models the CRE approach results in estimates that are significant in the pooled case but insignificant when they are generated using random effects.

Further, I suspect that the use of an interaction term in the CRE model is not helping here to preserve consistency between the different model types. The interaction is necessary to test one of my hypotheses, but the binary interaction term can be an outcome itself, so I do not want to include it on its own (to avoid the bad control problem). As a result, I am not quite sure whether to follow (Schmuck, 2013) and include also the interaction of the individual mean of this binary control with the individual mean of hours. Ideally the interaction would only feature in the second part of the two-part model, but for simplicity I show the code now for the case of a single regression equation for both parts:

Code:

twopm lead_don income hours 1.lead_vol_bin#c.hours i.year i.tobs meanincome meanhours, first(logit) second(glm, fam(poisson) link(log)) vce(cluster id)

Would you say the estimates of the second part of the two-part model with this specification should equal the ones generated by the FE Poisson model restricted to positive outcomes?

Code:

xtpoisson lead_don income hours 1.lead_vol_bin#c.hours i.year if lead_don>0, fe vce(robust)

You mentioned in your first post that this is not a consistent estimator to use for a two-part model, but I am not quite sure what else I can compare my estimates to to get a feeling whether they make sense. TheFE Poisson results of interest for example switch signs relative to the two-part model ones.

Thank you for your consistency in replying, your answers have been very insightful!

Best wishes,

Lieke
Comment
Lieke Holt

Join Date: Jun 2021

Posts: 12
#11

27 Jun 2021, 16:07

By now I am fairly certain that applying the CRE set-up to a non-RE model seems to generate incorrect estimates, especially for the second part. The estimates of the second part are of main interest to me.
In your first post you mentioned that individual dummies in a logit would not be a good idea. Using individual dummies in a Poisson model does not seem that severe to me.

Would it be possible to combine a first part CRE logit model with a second part FE poisson model? Potentially with an added interaction term included in the second stage which is not there in the first? Or if it is more sophisticated to use the same set of regressors for both and incorporate that same interaction term in the first part CRE set-up by including the product of the individual means of the two interaction variables as a regressor?

Best wishes,

Lieke
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#12

28 Jun 2021, 00:22

Dear Lieke Holt,

Just out of curiosity, did you try xtlogit FE in the first part and xtpoisson FE in the second part? This second part is equivalent to using poison with dummies; the first part in not the same as logit with dummies but is consistent with small T.

Best wishes,

Joao
Comment
Lieke Holt

Join Date: Jun 2021

Posts: 12
#13

28 Jun 2021, 02:44

Dear Joao Santos Silva,

Thank you for your reply! I was not aware of the consistency differences between the two types of logit estimation. Using both a FE first and second part definitely sounds like a more consistent option.
In the case of the logit first part the estimates are pretty close to the ones generated with the CRE set-up, so that is reassuring.

Since the twopm mainly (only?) serves to combine the two part's estimates, estimating the FE logit and Poisson models manually instead then appears to be the best option.
One issue with this is that I do not know which standard errors to opt for: xtlogit FE only has the bootstrap and jacknife option, but I would prefer to use xtpoisson FE with vce(robust). What would you suggest?

Best wishes,

Lieke
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#14

28 Jun 2021, 03:42

Dear Lieke Holt,

Jeff will be able to provide better advice on this, but consider using the clogit which implements the same estimator but allows different standard errors.

Best wishes,

Joao
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#15

29 Jun 2021, 20:08

Lieke: The problem with using FE Poisson in the second stage is that you need to justify it using an underlying model, and I don't see it. You'd be conditioning on a strictly positive outcome in every period, which is different from model the two-part decision period-by-period. Plus, FE logit requires serial independence whereas the pooled CRE logit does not. And if you don't care about the first stage, then you should just focus on the second stage, anyway.

If you write down a model, period by period, with unobserved heterogeneity, I don't see how FE Poisson in the second stage falls out.

When you implemented the CRE Poisson using pooled estimation, did you include the time averages of all variables, including the interaction? Also, I'm not very sold on the idea that you can't control for a variable because you think it might be a "bad control" but then you include it as an interaction. I think you have to take a stand to include it or not, and if you do, the main effect should be there.

Can you show output from the pooled CRE methods where you include a full set of time averages? Seeing output might help me.
1 like
Comment

Announcement

Fixed effects corner solution response model (Tobit)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment