Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dependent variable in % and GLM model

    Hallo, I have a Dependent variable (Y), normaly distributed, that goes from 0 to 1. I checked the correlation between Y and my Independt variables (pwcorr Y X1 X2..., sig) and I found significance for some of them. However, when I do a model (glm Y X1 X2 X3.. , family(binomial) link(logit) robust) just one X is significant and the others no. I checked the correlation between the indep variables and some present 0,7 correlation others 0,3 or 0,1. Should I take them out of the model? how can I fix this model? which tests should I use?
    Thank you

  • #2
    We can't easily tell. If your response has mean near 0.5 and small variance then a linear probability model might well work fine in practice. If your mean is near 0 or 1 then a model should surely respect the range of the response and use a link function that ensures predictions within (0, 1).

    Why not give a sample of your data or show a plot of your data, e..g a scatter plot matrix?

    Best to say that responses between 0 and 1 are proportions, fractions or probabilities, not percents.

    Comment


    • #3
      Good night Nick,
      Thank you very much for your answer! the mean of my DV is .69 and with a std dev of .18, so in this case should I avoid a linear probability model? In my case which link function should I use?
      I'm sending the sample of my data. I've been using the glm with the IVs but just "women in manager position" is significant, but doing separately some regression I found significance for some IVs and also significant correlation with most part of them...

      Thank you
      Attached Files

      Comment


      • #4
        A logit link sounds a good idea. I stop short of trying to read a spreadsheet file for reasons we do explain in https://www.statalist.org/forums/help#stata 12.5. As every new message prompt reminds you, posters are asked to read that first.

        Comment


        • #5
          Hallo Nick, thanks for the link, I didn't know, I'm new here, my first time using the forum.
          Anyway, I'll try to use the logit as you advise me. Thank you.

          Just to post what I did in the right way (I hope): my DV is MenComp and ranges from (0,1) with mean 0,69. Some of my IVs ranges from (0,1) as well but others are continuous, should I transform all of them in (0,1) as well? I just have one variable significant, but all of them are significantly correlated to my DV. I guess I'm doing something wrong.

          Sample:
          Code:
          input double(MenComp GlobalGenderGap WomenManagerPosition Adolescentbirthrate HumanCapitalIndexFemale)
          .11 .68   32 45.9 .643
          .14  .7 26.5 45.6 .682
          .18   .    .    .    .
           .2  .69    0   46    .
          .23   .    .    .    .
          .33   . 27.2 27.3    .
          .35 .76 46.3 13.5 .773
          .37 .72 56.7 52.8 .558
          .37 .68 20.7  4.6 .759
          .38   .    .    .    .
          end
          Listed 10 out of 175 observations

          Command used:
          Code:
          glm MenComp GlobalGenderGap WomenManagerPosition Adolescentbirthrate HumanCapitalIndexFemale, family(binomial) link(logit) robust
          
          note: MenComp has noninteger values
          
          Iteration 0:   log pseudolikelihood = -33.035298 
          Iteration 1:   log pseudolikelihood = -33.018227 
          Iteration 2:   log pseudolikelihood = -33.018218 
          Iteration 3:   log pseudolikelihood = -33.018218 
          
          Generalized linear models                          No. of obs= 76
          Optimization     : ML                                    Residual df=71
                                                                              Scale parameter = 1
          Deviance         =  12.06759915                   (1/df) Deviance =  .1699662
          Pearson          =  10.92649627                    (1/df) Pearson  =  .1538943
          
          Variance function: V(u) = u*(1-u/1)                [Binomial]
          Link function    : g(u) = ln(u/(1-u))                  [Logit]
          
                                                                                      AIC             =  1.000479
          Log pseudolikelihood = -33.01821785                BIC             = -295.4145
          
          -----------------------------------------------------------------------------------
                            |               Robust
                    MenComp            |      Coef.       Std. Err.       z       P>|z|        [95% Conf. Interval]
          ------------------+----------------------------------------------------------------
            GlobalGenderGap       |   -2.38153   1.628387    -1.46    0.144    -5.573109    .8100495
          WomenManagerPos~n |  -.0090151   .0063643    -1.42   0.157    -.0214888    .0034587
          Adolescentbirth~e         |  -.0054699   .0035937    -1.52   0.128    -.0125135    .0015736
          HumanCapitalInd~e      |   -1.78425   .8346752    -2.14    0.033    -3.420183   -.1483164
                      _cons                 |   3.988976   1.026239     3.89   0.000     1.977585    6.000366
          -----------------------------------------------------------------------------------
          
          .

          Comment


          • #6
            Sorry:
            Code:
             glm MenComp GlobalGenderGap WomenManagerPosition Adolescentbirthrate HumanCapitalI
            > ndexFemale, family(binomial) link(logit) robust
            note: MenComp has noninteger values
            
            Iteration 0:   log pseudolikelihood = -33.035298 
            Iteration 1:   log pseudolikelihood = -33.018227 
            Iteration 2:   log pseudolikelihood = -33.018218 
            Iteration 3:   log pseudolikelihood = -33.018218 
            
            Generalized linear models                          No. of obs      =        76
            Optimization     : ML                              Residual df     =        71
                                                               Scale parameter =         1
            Deviance         =  12.06759915                    (1/df) Deviance =  .1699662
            Pearson          =  10.92649627                    (1/df) Pearson  =  .1538943
            
            Variance function: V(u) = u*(1-u/1)                [Binomial]
            Link function    : g(u) = ln(u/(1-u))              [Logit]
            
                                                               AIC             =  1.000479
            Log pseudolikelihood = -33.01821785                BIC             = -295.4145
            
            -----------------------------------------------------------------------------------
                              |               Robust
                      MenComp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ------------------+----------------------------------------------------------------
              GlobalGenderGap |   -2.38153   1.628387    -1.46   0.144    -5.573109    .8100495
            WomenManagerPos~n |  -.0090151   .0063643    -1.42   0.157    -.0214888    .0034587
            Adolescentbirth~e |  -.0054699   .0035937    -1.52   0.128    -.0125135    .0015736
            HumanCapitalInd~e |   -1.78425   .8346752    -2.14   0.033    -3.420183   -.1483164
                        _cons |   3.988976   1.026239     3.89   0.000     1.977585    6.000366
            -----------------------------------------------------------------------------------
            
            .

            Comment

            Working...
            X