Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bounded Dependent Variable

    Hello everyone, I have a question. I am running a panel regression for 29 cities and a period of 7 years. My dependent variable is CGI which measures the level of income segregation strictly having a continuous positive value from 0 to 1. However, after running the regression, I found the constant to be a positive value of 1.85 (greater than the maximum value of CGI), while the variable of over65 (fraction of population > 64 years old) to have a coefficient of -1.27, which is lower than the minimum value of CGI. Should there be specific treatments on dependent variables with such characteristics? I know the latest version of stata have the option of beta and fractional regression but I do not have access to it and I think logistic regression option seems implausible since the dependent variable have a continuous value from 0 to 1. Below I attached the result of the regression,

    Code:
    xtreg   cgi   gini  emp1 lowskill1  logpop   logmed  own hs25 sarjana25 eighteen over65 i.year,  fe  robust
    
    Fixed-effects (within) regression               Number of obs      =       203
    Group variable: id                              Number of groups   =        29
    
    R-sq:  within  = 0.5303                         Obs per group: min =         7
           between = 0.1904                                        avg =       7.0
           overall = 0.0000                                        max =         7
    
                                                    F(16,28)           =     32.74
    corr(u_i, Xb)  = -0.6414                        Prob > F           =    0.0000
    
                                        (Std. Err. adjusted for 29 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
              cgi  |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            gini |   .2098422   .1119539     1.87   0.071    -.0194849    .4391694
            emp1 |  -.0026784   .0016391    -1.63   0.113     -.006036    .0006792
       lowskill1 |   .0852168   .0709025     1.20   0.239    -.0600204     .230454
          logpop |  -.0595715   .0732945    -0.81   0.423    -.2097085    .0905656
          logmed |  -.0617125   .0605833    -1.02   0.317    -.1858117    .0623867
             own |   .1888378   .0879009     2.15   0.040      .008781    .3688946
            hs25 |  -.0774792   .2259464    -0.34   0.734    -.5403094     .385351
       sarjana25 |   .5969337   .3611071     1.65   0.109    -.1427607    1.336628
        eighteen |   .0292786   .4995487     0.06   0.954    -.9940006    1.052558
          over65 |    -1.2718   .7602054    -1.67   0.105    -2.829011    .2854097
                 |
            year |
           2006  |   .0143375   .0154605     0.93   0.362    -.0173319     .046007
           2007  |  -.0090962   .0205345    -0.44   0.661    -.0511593    .0329668
           2008  |   -.025067   .0349562    -0.72   0.479    -.0966716    .0465375
           2009  |   .0210048   .0308709     0.68   0.502    -.0422314    .0842409
           2010  |   .0274367   .0366644     0.75   0.461    -.0476669    .1025404
           2011  |   .0174045    .039606     0.44   0.664    -.0637247    .0985337
                 |
           _cons |   1.849855   1.663708     1.11   0.276    -1.558097    5.257806
    -------------+----------------------------------------------------------------
         sigma_u |  .09637064
         sigma_e |  .03855145
             rho |  .86204928   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Thank you!

  • #2
    There is nothing implausible about logit or logistic models here. Note that the use from about 1940 onwards of logit as (in modern terms) a link function for binary responses postdates by at least a century use of the logistic as a sigmoid curve that ascends or descends continuously from one asymptote to another. If the idea is that the mean segregation is changing smoothly as a function of predictors, then a logit link is the very first thing I would try.

    The only question is which xt command is most suited for your specific problem, which I will leave for others to advise. A fallback position is to transform your response to logit scale, but the back-transformation would remain.
    Last edited by Nick Cox; 28 May 2016, 02:17.

    Comment


    • #3
      On the issue of your version of Stata:

      I'm not familiar with panel regressions in Stata, so don't know if this is helpful, but you can do cross-sectional fractional regressions in the program without the explicit fracreg command.

      There is a useful short note on this here: http://www.ats.ucla.edu/stat/stata/faq/proportion.htm, which is an extension of a note that Nick co-wrote.
      Last edited by Josh Budlender; 28 May 2016, 12:27.

      Comment


      • #4
        Usha Adelina: you should report precisely which Stata version you have -- it matters for the advice you wish to receive. (Please read the forum FAQ about this.)

        Fractional regression models can be estimated using glm with robust standard errrors. See "Econometric Methods for Fractional Response Variables with an Application to 401 (K) Plan Participation Rates" Leslie E. Papke and Jeffrey M. Wooldridge, Journal of Applied Econometrics, Vol. 11, No. 6 (Nov. - Dec., 1996), pp. 619-632.
        See also Rich Williams's program fracglm described in https://www3.nd.edu/~rwilliam/stats3...onseModels.pdf

        For the panel version, see Leslie E. Papke and Jeffrey M. Wooldridge Panel data methods for fractional response variables with an application to test pass rates, Journal of Econometrics 145 (2008) 121–133
        Last edited by Stephen Jenkins; 28 May 2016, 12:25.

        Comment


        • #5
          Usha:
          as an aside to the previous helpful remarks, I would start to worry about the magnitude of -cons- and -over65- after -test-ing the null hypothesis that they equals 0 (an hypothesis that, in all likelihood, will not be rejected by your data), as in the following toy-example which focuses on one predictor only:
          Code:
          . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
          
          . xtreg ln_wage hours, fe vce(robust)
          
          Fixed-effects (within) regression               Number of obs     =     28,467
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.0001                                         min =          1
               between = 0.0314                                         avg =        6.0
               overall = 0.0074                                         max =         15
          
                                                          F(1,4709)         =       0.81
          corr(u_i, Xb)  = 0.0976                         Prob > F          =     0.3696
          
                                       (Std. Err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 hours |   .0004474   .0004986     0.90   0.370    -.0005301     .001425
                 _cons |   1.658941   .0182299    91.00   0.000     1.623202     1.69468
          -------------+----------------------------------------------------------------
               sigma_u |   .4229084
               sigma_e |  .32040339
                   rho |  .63532952   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . test _b[hours]=0
          
           ( 1)  hours = 0
          
                 F(  1,  4709) =    0.81
                      Prob > F =    0.3696
          If this were the case, as they are not significant, i would not consider their magnitude a matter of concern.
          Conversely, I would be more worried about the evidence that a very limited number of coefficients in your model seems to explain some kind of variation in your -depvar- when adjusted for the remaining predcitors.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Hello everyone, I have a similar question. My dependent variable is a score bounded between zero and one, my panel is unbalanced and with limited variability in my dependent variable across time in a particular cross-sectional unit. I have gone through all the papers related to fractional regression model in panel settings. I find exponential fractional regression model proposed by (Ramalho, 2015) more recent and flexible. They have included time dummies in their model. Is it appropriate to add time dummies in the model when your dependent variable has less time variability. Is there any method that simultaneously take into account fractional nature of dependent variable, unbalanced panel data setting and endogeneity issue in few covariates. In addition can be run in STATA.

            Comment

            Working...
            X