Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • (logistic) regression output does not reflect descriptive statistics

    Hi everyone,

    I am working on a study where I aim to understand the influence of relative wages on job quits.
    However, I seem to have a problem reconciling the descriptive statistics and the logistic regression output.

    For illustration, below presented is the descriptive statistics of employees who quit their jobs contingent on their relative wage position (0 - 1). 1 being the highest relative wage position, while 0 the lowest.

    Year 0-0.1 0.1-0.2 0.1-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-1
    1998 108 255 381 386 409 531 504 480 646 800
    2.2% 5.361 8:009 8.114 8.598 11,162 10.595 10.090 13.580 22.010
    1999 100 288 400 432 420 490 548 422 542 750
    2.48% 6.291 8.737 9.436 9.174 10,703 11.970 9.218 11.839 18.09%
    For example, in 1998, 2.2% of the employees in the position between 0-0.1 quit their jobs, while 22% in the position between 0.9-1 quit their jobs. Essentially, higher the position, the greater is the likelihood to quit.

    However, when I perform the logistic regression (the dependent variable is a binary where 0 represents stay in the firm, while 1 represents quit), I get the opposite effects. In short, the co-efficient for the relative wage position is negative. I have the marginal plot output below.






    The marginal plot suggests that those with lower relative wage positions are more likely to quit than those with a higher relative wage position.

    The data seems to be OK. I have three questions.

    1. Can anyone help me with this mismatch between the descriptive and the marginal plot graph?
    2. Is it just the way of interpreting the logistic regression output?
    3. Is there any other descriptive statistic (or maybe a graph) someone can suggest that could help me present in my study that reconciles with the logistic regression?


    Thanks in advance,

    J

  • #2
    Why do you not follow the FAQ "Advice on Posting to Statalist" ?

    1) You should not attach files other than .png (FAQ #12.4 & 12.5).
    2) You should show the commands you actually used (FAQ 12.1).
    3) You should present an example of the data used (FAQ 12.2).
    4) You should use code delimiters (also when showing your results) (FAQ 12.3)

    5) When showing output it should be the output you actually got, not some edited version. For example, some "percentages" in your table show percentage signs, some not. One cell uses the colon (":") as decimal point. The percentages don't add up to 100.

    6) Your table is showing row percentages, but if "1998" and "1999" are the labels of the values of your dependent variable, you should use column percentages.

    Using your data (with the above mentioned problems), the following shows that the regression output fits exactly to the descriptive statistics in your table:

    Code:
    clear
    input quit wage freq
       0  1 108
       0  2 255
       0  3 381
       0  4 386
       0  5 409
       0  6 531
       0  7 504
       0  8 480
       0  9 646
       0 10 800
       1  1 100
       1  2 288
       1  3 400
       1  4 432
       1  5 420
       1  6 490
       1  7 548
       1  8 422
       1  9 542
       1 10 750
    end
    
    lab def quit 0 "1998" 1 "1999"
    lab val quit quit
    
    lab def wage 1 "0-0.1"   2 "0.1-0.2" 3 "0.2-0.3" 4 "0.3-0.4"  5 "0.4-0.5" ///
                 6 "0.5-0.6" 7 "0.6-0.7" 8 "0.7-0.8" 9 "0.8-0.9" 10 "0.9-1.0"
    lab val wage wage
    
    tab2 quit wage [fw=freq], row   // wrong to use row percentages!
    tab2 quit wage [fw=freq], col   // correct
    
    logistic quit i.wage [fw=freq]
    margins wage
    The resuls of -margins- perfectly correspond to the results of the twoway table:
    Code:
    . tab2 quit wage [fw=freq], row   // wrong to use row percentages!
    
    -> tabulation of quit by wage  
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
               |                                                     wage
          quit |     0-0.1    0.1-0.2    0.2-0.3    0.3-0.4    0.4-0.5    0.5-0.6    0.6-0.7    0.7-0.8    0.8-0.9    0.9-1.0 |     Total
    -----------+--------------------------------------------------------------------------------------------------------------+----------
          1998 |       108        255        381        386        409        531        504        480        646        800 |     4,500
               |      2.40       5.67       8.47       8.58       9.09      11.80      11.20      10.67      14.36      17.78 |    100.00
    -----------+--------------------------------------------------------------------------------------------------------------+----------
          1999 |       100        288        400        432        420        490        548        422        542        750 |     4,392
               |      2.28       6.56       9.11       9.84       9.56      11.16      12.48       9.61      12.34      17.08 |    100.00
    -----------+--------------------------------------------------------------------------------------------------------------+----------
         Total |       208        543        781        818        829      1,021      1,052        902      1,188      1,550 |     8,892
               |      2.34       6.11       8.78       9.20       9.32      11.48      11.83      10.14      13.36      17.43 |    100.00
    
    . tab2 quit wage [fw=freq], col   // correct
    
    -> tabulation of quit by wage  
    
    +-------------------+
    | Key               |
    |-------------------|
    |     frequency     |
    | column percentage |
    +-------------------+
    
               |                                                     wage
          quit |     0-0.1    0.1-0.2    0.2-0.3    0.3-0.4    0.4-0.5    0.5-0.6    0.6-0.7    0.7-0.8    0.8-0.9    0.9-1.0 |     Total
    -----------+--------------------------------------------------------------------------------------------------------------+----------
          1998 |       108        255        381        386        409        531        504        480        646        800 |     4,500
               |     51.92      46.96      48.78      47.19      49.34      52.01      47.91      53.22      54.38      51.61 |     50.61
    -----------+--------------------------------------------------------------------------------------------------------------+----------
          1999 |       100        288        400        432        420        490        548        422        542        750 |     4,392
               |     48.08      53.04      51.22      52.81      50.66      47.99      52.09      46.78      45.62      48.39 |     49.39
    -----------+--------------------------------------------------------------------------------------------------------------+----------
         Total |       208        543        781        818        829      1,021      1,052        902      1,188      1,550 |     8,892
               |    100.00     100.00     100.00     100.00     100.00     100.00     100.00     100.00     100.00     100.00 |    100.00
    
    .
    . logistic quit i.wage [fw=freq]
    
    Logistic regression                                     Number of obs =  8,892
                                                            LR chi2(9)    =  22.15
                                                            Prob > chi2   = 0.0084
    Log likelihood = -6151.7348                             Pseudo R2     = 0.0018
    
    ------------------------------------------------------------------------------
            quit | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            wage |
        0.1-0.2  |   1.219765   .1991359     1.22   0.224     .8857495    1.679737
        0.2-0.3  |   1.133858    .177056     0.80   0.421     .8349119    1.539845
        0.3-0.4  |   1.208705   .1878933     1.22   0.223     .8912526    1.639229
        0.4-0.5  |   1.109046   .1721174     0.67   0.505      .818179    1.503319
        0.5-0.6  |   .9966102   .1517445    -0.02   0.982     .7394704    1.343166
        0.6-0.7  |   1.174286   .1783531     1.06   0.290     .8719497    1.581452
        0.7-0.8  |      .9495   .1462115    -0.34   0.736     .7021344    1.284014
        0.8-0.9  |     .90613   .1363786    -0.65   0.513     .6746507    1.217032
        0.9-1.0  |     1.0125   .1496398     0.08   0.933     .7578686    1.352683
                 |
           _cons |   .9259259   .1284979    -0.55   0.579     .7054211    1.215358
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    . margins wage
    
    Adjusted predictions                                     Number of obs = 8,892
    Model VCE: OIM
    
    Expression: Pr(quit), predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            wage |
          0-0.1  |   .4807692   .0346431    13.88   0.000       .41287    .5486685
        0.1-0.2  |   .5303867   .0214174    24.76   0.000     .4884094    .5723641
        0.2-0.3  |   .5121639   .0178861    28.63   0.000     .4771078      .54722
        0.3-0.4  |   .5281174   .0174544    30.26   0.000     .4939073    .5623274
        0.4-0.5  |   .5066345   .0173642    29.18   0.000     .4726013    .5406677
        0.5-0.6  |   .4799216   .0156353    30.69   0.000      .449277    .5105663
        0.6-0.7  |   .5209125   .0154022    33.82   0.000     .4907248    .5511002
        0.7-0.8  |   .4678492   .0166137    28.16   0.000     .4352869    .5004115
        0.8-0.9  |    .456229   .0144508    31.57   0.000      .427906     .484552
        0.9-1.0  |    .483871   .0126934    38.12   0.000     .4589924    .5087496
    ------------------------------------------------------------------------------
    By the way: Your description of the intervals is ambiguous. For example, does the interval 0-0.1 include 0.1 or not? Does the interval 0.1-0.2 include 0.1 or not? ...
    Last edited by Dirk Enzmann; 19 Jan 2022, 04:39.

    Comment


    • #3
      This is really helpful, Drik! thank you so much!
      And yes, I undertand the protocols for posting on the group. I will make sure to follow the same the next time I post.

      Comment

      Working...
      X