Fractional response regression for panel data

Jessica Berrett

Join Date: Sep 2019

Posts: 57
#1

Fractional response regression for panel data

07 Oct 2024, 14:44

Is anyone familiar with doing fractional response regression for panel data in Stata?

I'm trying to figure out if I'm using the correct code:

fracreg logit DV IV controls mean_of_the_IV mean_of_each_control i.Year, vce(cluster orgID)
Tags: None
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#2

07 Oct 2024, 18:13

Hi Jessica, i haven't used fracreg, but I have used glm, with the binomial family, and the logit link, and that works. Looking at how you describe your variable list, I see that you are using the Mundlak approach, with fixed time effects as well. That should work. Since you're clustering your variance at orgID, I assume that is the group (case) level at which you are calculating the means of the IV and the controls. As such you test for fixed or random effects doing the test of joint significance of the mean variables. If you can reject the null in that test, then fixed effects estimation is preferred and the model as specified is preferred. If you cannot reject the null, then the model without the means is preferred.

You may want to check the estimation using glm, and compare to the one using fracreg logit. They should be the same.

Alfonso Sanchez-Penalver
1 like
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#3

08 Oct 2024, 11:13

I appreciate your response Alfonso! I just tried the estimation using both glm and gee. When I looked up how to use these for fractional response, neither say anything about including the means of the IV's in the model.

I get the same results when I use any of the following:

fracreg logit DEAScore UnrestrictedFundsRatio lnASSETS_TOTAL_EOY AGE lnSERVICE_REVENUE HHI3 leverage i.GSA_n lnHomeValueIndex i.Year, vce(cluster EIN)

xtgee DEAScore UnrestrictedFundsRatio lnASSETS_TOTAL_EOY AGE lnSERVICE_REVENUE HHI3 leverage i.GSA_n lnHomeValueIndex, family(binomial) link(logit) corr(independent) robust

glm DEAScore UnrestrictedFundsRatio lnASSETS_TOTAL_EOY AGE lnSERVICE_REVENUE HHI3 leverage i.GSA_n lnHomeValueIndex, family(binomial) link(logit) vce(cluster EIN)

However, as soon as I integrate the means for the IVs, all of my results become non-significant. I also can't find anywhere that says include the averages. I've looked at numerous papers that utilized fractional response regression for panel data, and none mention or report the coefficients for the averages of the IVs. I'm just trying to figure out if they need to be included and how to be more confident with that aspect of the model.

Thank you!
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#4

08 Oct 2024, 11:33

I was reading a little more on this topic. Is it correct that you would only use the Mundlak approach for a random effects model?
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#5

08 Oct 2024, 12:11

Mundlak is a mix of FE and RE.

xtgee included fixed effects on the cross section, so the mean terms get eaten.

if you're just interested in mean effects, then reghdfe (accounting for heteroskedasticity) would probably work just fine.

The logistic transformation might be an option, which is suited for OLS (accounting for hetero).
1 like
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#6

08 Oct 2024, 18:29

Sorry Jessica, I don't understand what you mean by integrating the means out. Do you mean demean by group? Ok let us start with some basic concepts of panel data, not just fractional response. I believe this will include some of what George Ford is referring to.

Ok, so the issue with grouped data, of which panel is one type, is that they may be case (group) specific unobserved effects. If these effects are correlated with the explanatory variables, we use what is called fixed effects estimation. if they are uncorrelated with the explanatory variables, pooled estimation is consistent, but random effects is efficient, i.e. reduces the standard errors because it removes the variation of the unobserved effects.

Alright. If you want to estimate a fixed effects model you have to identical ways: you include dummy variables for all groups and no intercept (or intercept and dummies for all but one group), or you demean by group all variables, i.e. explained and explanatory. This last method should be run without an intercept, but Stata decided to add the overall mean of the explained variable back in, and thus includes an intercept to capture it. Both of these methods will remove all case invariant explanatory variables, where the mean variables you created are included, because for the same case they have the same value, and when you demean them they all go to 0 (or they are perfectly collinear with the dummy variables if you are using that appropach).

Random effects estimation is more complex, but it is not going to clarify what is going on. So I leave it at that.

Alright, now the Mundlak approach is not quite FE and RE that George referred to, although it allows for the test between them. What you are doing is capturing the effect of the variation across the groups (between groups) with the mean variables, since these only vary across cases, and thus removing that variation from the regular variables. What this does is have the regular variables capture the same variation that you did in the fixed effects estimation (the within group variation). Notice that the Mundlak approach is equally valid as the other two approaches. You can even do the Mundlak approach with random effects if the command allows to model random effects. This is something called correlated random effects.

So summing up a bit. Adding the means is to be able to test whether random effects or fixed effects are appropriate, in the way I explained in my last post. If at least one of the coefficients on those mean variables is significant then fixed effects are appropriate, so either the estimation as you did including the mean variables, or using an xt command with fixed effects. If none of the coefficients on those means are significant, then random effects is appropriate. Pooled estimation is consistent, which is why you can drop all the mean variables in that case, but it is not efficient because of what I said before. Unfortunately I do not know any command that does random effects for fractional response models in Stata.

You also mention that you haven't seen any of the papers reporting the coefficients on the averages? Let me refer you to the seminal paper on this topic by Jeff Wooldridge and Leslie Papke: https://pages.stern.nyu.edu/~wgreene...alResponse.pdf. The paper is a bit technical, but you will see how in his application he specifies the average variables. I am not sure why people do not report the coefficients on the mean variables, but that doesn't mean that your approach is wrong. I hope I have illustrated that there are different alternatives to modeling fixed effects.

Last edited by Alfonso Sánchez-Peñalver; 08 Oct 2024, 18:31.

Alfonso Sanchez-Penalver
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#7

09 Oct 2024, 07:27

Thank you very much Alfonso and George, this is starting to make much more sense to me.
Comment

Announcement

Fractional response regression for panel data

Comment

Comment

Comment

Comment

Comment

Comment