fractional regression

nesrine EL imayem

Join Date: Dec 2016

Posts: 11
#1

fractional regression

30 Dec 2016, 05:46

i must estimate a system of equation:
w_alimentation= a_alimentation+ b_alimentation ln R
w_habillement= a_habillement+ b_habillement ln R
w_habitation= a_habitation + b_habitation ln R
w_santé= a_sante + b_sante ln R
w_transport= a_transport + b_transport ln R
w_divers= a_divers + b_divers ln R

w_i : the ième budget share. w_i is calculated as: (expenditure / household’s income )
lnR : the logarithm of household's income

I know that i must use a dirtifit model to ensure that the adding up condition is verified (sum of budget share equal to 1= the proportions should add up to 1).
But in order to estimate my model, i must estimate each equation separately(and not the 6 equations in the same time).
I must estimate each equation by a logistic fractional model using fracglm (wi is a proportion). I would like to ask you how can i impose the adding up condition in this case.

Can you help me?
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

31 Dec 2016, 12:19

You didn't get a quick answer, perhaps because you didn't follow the guidance in the FAQ on asking questions. It helps if you give us the Stata code, Stata output, and a sample of the data (using dataex).

dirtifit and fracglm do not appear in a search from the Stata command line. At least dirtifit doesn't appear in in a Google search. I have no idea what terminology you're using, but if it doesn't appear in a Google search it is pretty esoteric. You also don't ask how to estimate the model - you ask how to fix a specific program (which as I said doesn't appear in a Stata search) to solve a particular problem. If fracglm is a single equation model that assumes all explanatory variables are exogenous, then the answer to your question is probably that we can't help.

With the identity, any equation by equation estimation is almost certainly inconsistent. You can probably do this model in cmp (user written) or in SEM or GSEM.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#3

31 Dec 2016, 13:06

Phil gives excellent advice. My guess is that this model requires customized programming.

dirtifit is presumably a misrendering of dirifit (SSC).
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4941
#4

31 Dec 2016, 13:58

fracglm is a never-released program of mine. Most of what it does can be done with fracreg if you have Stata 14. If you are condemned to using an earlier version of Stata, info on fracglm can be found at

http://www3.nd.edu/~rwilliam/stats3/...onseModels.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

01 Jan 2017, 08:53

You need to use the multinomial fractional logit model described in J. Mullahy (2015), "Multivariate Fractional Regression Estimation of Econometric Share Models," Journal of Econometric Methods 4, 71-100. I believe he has written Stata code for it. This will ensure that the shares sum to unity. Estimating separate fractional response models won't do it.

It is an easy programming problem because you just use the multinomial log likelihood but with shares rather than discrete outcomes. The justification is essentially the same as in the case with two fractions summing to unity.

JW
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 742
#6

01 Jan 2017, 09:48

Nesrine: I'm happy to send you the code Jeff describes (it's Mata code, but pretty easy to use even if you are not familiar with Mata). Email me at [email protected].

One item for the Stata v.15 wishlist would be to tweak mlogit to accommodate fractional outcomes. Then estimation would be as straightforward as:

mlogit ..., robust ....
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4941
#7

01 Jan 2017, 11:16

Does Maarten Buis's fmlogit (available from SSC) already do this? The description reads

fmlogit fits by quasi maximum likelihood a fractional multinomial logit model. Each variable in depvarlist ranges between 0 and 1 and all variables in depvarlist must, for each observation, add up to 1: for example, they may be proportions. It is a multivariate generalization of the fractional logit model proposed by Papke and Wooldridge (1996).

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#8

01 Jan 2017, 11:33

Originally posted by Richard Williams View Post

Does Maarten Buis's fmlogit (available from SSC) already do this? The description reads

Yes, that should do it! That's very good to know. Thanks Richard.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#9

04 Jan 2017, 03:02

I received the following question from Nesrine:

For some households, the budget share is equal to 0 (the dependent variable of my model). I am aware that i can not simply delete those observations (households) because i delete information.

In the litterature, some researchers use a tobit model to resolve this problem (budget share for some goods equal to 0). But i am not convinced i can use a tobit model because this suppose that the error term is normally distributed which is not the case when we want to model a proportion (the variance is heteroscedastic).

Using a fractional logit model is also not a solution because it ignore the fact that the dependent variable (budget share) can have value which is equal to one.

To be more clear, my model must respect three condition:
sum of budget shares equal to 1

the dependent variable must be between 0 and 1

the model must take into account the fact that some budget shares (the dependent variable) are equal to 0

Can the fractional multinomial logit model resolve this problem?

Yes, the fractional multinomial logit is a generalization of the fractional logit, so it does allow for proportions of exactly 0 (or exactly 1). It has a particular way of dealing with it: Basically it assumes that a family has noting in principle against spending money on that category, but that it has such a low priority that they haven't done it yet. fmlogit is less appropriate when families deliberately choose to spent exactly nothing on a given category. For example, if you look at the proportion of the budget spent on meat and you have vegetarians in your sample.

Another thing to consider is that your categories seem so broad that I would be very suspicious about someone reporting 0 percent on any of these. The only case I can imagine would be a pastor whose house is provided by her or his church, and who would thus spent nothing on housing. So my first step would be to question the data. Look at each and every observation with 0 percent and see if you can find a reasonable explanation based on everything you know about that person and his or her family. Also look at the exact wording of the question in the questionnaire and see if a respondent could interpret (or misunderstand) it more narrowly than you might have liked.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
3 likes
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#10

04 Jan 2017, 08:22

Yet another thing to consider is the correlation structure for your proportions. For example, I would expect a negative correlation between the share spent on housing and the share spent on transportation.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
nesrine EL imayem

Join Date: Dec 2016

Posts: 11
#11

06 Jan 2017, 12:47

Originally posted by Maarten Buis View Post

I received the following question from Nesrine:

Yes, the fractional multinomial logit is a generalization of the fractional logit, so it does allow for proportions of exactly 0 (or exactly 1). It has a particular way of dealing with it: Basically it assumes that a family has noting in principle against spending money on that category, but that it has such a low priority that they haven't done it yet. fmlogit is less appropriate when families deliberately choose to spent exactly nothing on a given category. For example, if you look at the proportion of the budget spent on meat and you have vegetarians in your sample.

Another thing to consider is that your categories seem so broad that I would be very suspicious about someone reporting 0 percent on any of these. The only case I can imagine would be a pastor whose house is provided by her or his church, and who would thus spent nothing on housing. So my first step would be to question the data. Look at each and every observation with 0 percent and see if you can find a reasonable explanation based on everything you know about that person and his or her family. Also look at the exact wording of the question in the questionnaire and see if a respondent could interpret (or misunderstand) it more narrowly than you might have liked.

----------
I have noticed that i have forgotten to insert the alcohol and tobacco's budget share in the system of quation I want to estimate:
w_{alcooltobacco}= a_{alcoholtobacco}+ b_{alcoholtobacco} ln R

In my model, the fact that the proportion is equal to 0 can be due to 2 reasons:
**the houshold don't consume the good (this is the case for example for alcohol and tobacco). In this case, households deliberately choose not to consume the good or
**the duration of the survey(a month) is less than the interval of the purchase of the good (this is the case for housing for example)

Is the fmlogit appropriate in this case?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#12

20 Jan 2017, 01:37

The alcohol/tobacco category could be a case where 0s are real 0s, but it does not have to be the case. In my families case, we don't smoke, but we don't have anything in principle against alcohol, we just consume it rarely. So much so that the growth of our stock due to the occasional gifted bottle of wine is larger than our consumption... So in my families case a 0 for alcohol/tobacco would be perfectly fine for fmlogit.

More generally, the definition of a model is that it is a simplification of reality, and a simplification is just another word for "wrong in some reasonable way". So the fact that fmlogit models some of the 0s wrong is in itself not a problem, as long as it is still "reasonable". That is a tricky and necessarily subjective trade-off you have to make, and then you have to convince your audience.

Additionally, Nesrine El Imayem asked me privately:

Identification of the fractional multinomial model requires normalizing one set of parameters, βJ = 0, for instance. In other words, if i estimate a system of 5 equations, i will be able to recover only the estimated parameters for only 4 equations.

I have a question at this level. Is there a solution to find the estimated parameters of the equation that was deleted for the system to be identifiable?

By "estimated parameters" i mean the true parameters of the equations and not the APE (average partial effect) that i can easily calculate.

The solution is simple: those parameters are 0. What confuses many about that answer is that they think of beta_j as the parameters for explaining outcome j, and that is incorrect. They are the parameters for explaining the ratio of outcome j and reference outcome; the relative proportion or odds.

Some people find that awkward, but remember that it is a necessity to estimate only 4 sets of parameters for 5 proportions if you want to impose the constraint that the proportions add up to 1; Once we have predicted 4 proportions, we have, with that constraint, automatically also predicted the 5th.

Last edited by Maarten Buis; 20 Jan 2017, 01:42.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Guest
#13

30 Apr 2019, 11:29

Hello all!
I am mainly working in the context of rural non-farm sector diversification. Thus, i want to model the diversification strategies of the farmers. My dependent variable is share of income from a particular category in total income of the household. Thus, i want to estimate the diversification decisions of the household using a multinomial fractional response model. However, I have a panel data . So how can the fmlogit command be extended in this case to account for panel data? I am really stuck because of the STATA Command. Any reply would be appreciated.
Comment

Announcement

fractional regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment