Bounded Dependent Variable

Usha Adelina

Join Date: Apr 2016
Posts: 15

Bounded Dependent Variable

28 May 2016, 00:11

Hello everyone, I have a question. I am running a panel regression for 29 cities and a period of 7 years. My dependent variable is CGI which measures the level of income segregation strictly having a continuous positive value from 0 to 1. However, after running the regression, I found the constant to be a positive value of 1.85 (greater than the maximum value of CGI), while the variable of over65 (fraction of population > 64 years old) to have a coefficient of -1.27, which is lower than the minimum value of CGI. Should there be specific treatments on dependent variables with such characteristics? I know the latest version of stata have the option of beta and fractional regression but I do not have access to it and I think logistic regression option seems implausible since the dependent variable have a continuous value from 0 to 1. Below I attached the result of the regression,

Code:

xtreg   cgi   gini  emp1 lowskill1  logpop   logmed  own hs25 sarjana25 eighteen over65 i.year,  fe  robust

Fixed-effects (within) regression               Number of obs      =       203
Group variable: id                              Number of groups   =        29

R-sq:  within  = 0.5303                         Obs per group: min =         7
       between = 0.1904                                        avg =       7.0
       overall = 0.0000                                        max =         7

                                                F(16,28)           =     32.74
corr(u_i, Xb)  = -0.6414                        Prob > F           =    0.0000

                                    (Std. Err. adjusted for 29 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          cgi  |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        gini |   .2098422   .1119539     1.87   0.071    -.0194849    .4391694
        emp1 |  -.0026784   .0016391    -1.63   0.113     -.006036    .0006792
   lowskill1 |   .0852168   .0709025     1.20   0.239    -.0600204     .230454
      logpop |  -.0595715   .0732945    -0.81   0.423    -.2097085    .0905656
      logmed |  -.0617125   .0605833    -1.02   0.317    -.1858117    .0623867
         own |   .1888378   .0879009     2.15   0.040      .008781    .3688946
        hs25 |  -.0774792   .2259464    -0.34   0.734    -.5403094     .385351
   sarjana25 |   .5969337   .3611071     1.65   0.109    -.1427607    1.336628
    eighteen |   .0292786   .4995487     0.06   0.954    -.9940006    1.052558
      over65 |    -1.2718   .7602054    -1.67   0.105    -2.829011    .2854097
             |
        year |
       2006  |   .0143375   .0154605     0.93   0.362    -.0173319     .046007
       2007  |  -.0090962   .0205345    -0.44   0.661    -.0511593    .0329668
       2008  |   -.025067   .0349562    -0.72   0.479    -.0966716    .0465375
       2009  |   .0210048   .0308709     0.68   0.502    -.0422314    .0842409
       2010  |   .0274367   .0366644     0.75   0.461    -.0476669    .1025404
       2011  |   .0174045    .039606     0.44   0.664    -.0637247    .0985337
             |
       _cons |   1.849855   1.663708     1.11   0.276    -1.558097    5.257806
-------------+----------------------------------------------------------------
     sigma_u |  .09637064
     sigma_e |  .03855145
         rho |  .86204928   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Thank you!

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35405
#2

28 May 2016, 02:09

There is nothing implausible about logit or logistic models here. Note that the use from about 1940 onwards of logit as (in modern terms) a link function for binary responses postdates by at least a century use of the logistic as a sigmoid curve that ascends or descends continuously from one asymptote to another. If the idea is that the mean segregation is changing smoothly as a function of predictors, then a logit link is the very first thing I would try.

The only question is which xt command is most suited for your specific problem, which I will leave for others to advise. A fallback position is to transform your response to logit scale, but the back-transformation would remain.

Last edited by Nick Cox; 28 May 2016, 02:17.
Comment
Josh Budlender

Join Date: Dec 2015

Posts: 15
#3

28 May 2016, 12:21

On the issue of your version of Stata:

I'm not familiar with panel regressions in Stata, so don't know if this is helpful, but you can do cross-sectional fractional regressions in the program without the explicit fracreg command.

There is a useful short note on this here: http://www.ats.ucla.edu/stat/stata/faq/proportion.htm, which is an extension of a note that Nick co-wrote.

Last edited by Josh Budlender; 28 May 2016, 12:27.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#4

28 May 2016, 12:22

Usha Adelina: you should report precisely which Stata version you have -- it matters for the advice you wish to receive. (Please read the forum FAQ about this.)

Fractional regression models can be estimated using glm with robust standard errrors. See "Econometric Methods for Fractional Response Variables with an Application to 401 (K) Plan Participation Rates" Leslie E. Papke and Jeffrey M. Wooldridge, Journal of Applied Econometrics, Vol. 11, No. 6 (Nov. - Dec., 1996), pp. 619-632.
See also Rich Williams's program fracglm described in https://www3.nd.edu/~rwilliam/stats3...onseModels.pdf

For the panel version, see Leslie E. Papke and Jeffrey M. Wooldridge Panel data methods for fractional response variables with an application to test pass rates, Journal of Econometrics 145 (2008) 121–133

Last edited by Stephen Jenkins; 28 May 2016, 12:25.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17671

29 May 2016, 02:18

Usha:
as an aside to the previous helpful remarks, I would start to worry about the magnitude of -cons- and -over65- after -test-ing the null hypothesis that they equals 0 (an hypothesis that, in all likelihood, will not be rejected by your data), as in the following toy-example which focuses on one predictor only:

Code:

. use "http://www.stata-press.com/data/r14/nlswork.dta", clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage hours, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =     28,467
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.0001                                         min =          1
     between = 0.0314                                         avg =        6.0
     overall = 0.0074                                         max =         15

                                                F(1,4709)         =       0.81
corr(u_i, Xb)  = 0.0976                         Prob > F          =     0.3696

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hours |   .0004474   .0004986     0.90   0.370    -.0005301     .001425
       _cons |   1.658941   .0182299    91.00   0.000     1.623202     1.69468
-------------+----------------------------------------------------------------
     sigma_u |   .4229084
     sigma_e |  .32040339
         rho |  .63532952   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test _b[hours]=0

 ( 1)  hours = 0

       F(  1,  4709) =    0.81
            Prob > F =    0.3696

If this were the case, as they are not significant, i would not consider their magnitude a matter of concern.
Conversely, I would be more worried about the evidence that a very limited number of coefficients in your model seems to explain some kind of variation in your -depvar- when adjusted for the remaining predcitors.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Panika Jain

Join Date: Mar 2019

Posts: 8
#6

28 Apr 2019, 01:27

Hello everyone, I have a similar question. My dependent variable is a score bounded between zero and one, my panel is unbalanced and with limited variability in my dependent variable across time in a particular cross-sectional unit. I have gone through all the papers related to fractional regression model in panel settings. I find exponential fractional regression model proposed by (Ramalho, 2015) more recent and flexible. They have included time dummies in their model. Is it appropriate to add time dummies in the model when your dependent variable has less time variability. Is there any method that simultaneously take into account fractional nature of dependent variable, unbalanced panel data setting and endogeneity issue in few covariates. In addition can be run in STATA.
Comment

Announcement

Bounded Dependent Variable

Comment

Comment

Comment

Comment

Comment