Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Errors in Calculating Predicted Probabilities for Firthlogit


    1.

    firthlogit dv1 city base territory border height pop ln_63 mod1r_c loss dv1_lag

    initial: penalized log likelihood = -563.45943
    rescale: penalized log likelihood = -563.45943
    Iteration 0: penalized log likelihood = -563.45943 (not concave)
    Iteration 1: penalized log likelihood = -560.84998 (not concave)
    Iteration 2: penalized log likelihood = -556.98981
    Iteration 3: penalized log likelihood = -545.34792
    Iteration 4: penalized log likelihood = -544.1852
    Iteration 5: penalized log likelihood = -543.9907
    Iteration 6: penalized log likelihood = -543.99019
    Iteration 7: penalized log likelihood = -543.99019

    Number of obs = 104,821
    Wald chi2(10) = 43.84
    Penalized log likelihood = -543.99019 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------
    dv1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    city | .9560908 .2845525 3.36 0.001 .3983782 1.513803
    base | -.2890522 .2242761 -1.29 0.197 -.7286253 .1505209
    territory | .1109483 .0907729 1.22 0.222 -.0669634 .2888599
    border | .3606393 .1794106 2.01 0.044 .0090009 .7122778
    height | -.0388977 .0217395 -1.79 0.074 -.0815063 .003711
    pop | .1534958 .1372472 1.12 0.263 -.1155037 .4224953
    ln_63 | -.9431281 .4695627 -2.01 0.045 -1.863454 -.0228021
    mod1r_c | -.1244403 .1046199 -1.19 0.234 -.3294916 .080611
    loss | .0000518 .0000161 3.22 0.001 .0000202 .0000835
    dv1_lag | 2.746286 1.431237 1.92 0.055 -.0588876 5.551459
    _cons | -7.84713 3.080853 -2.55 0.011 -13.88549 -1.808769
    ------------------------------------------------------------------------------

    . margins, at ( loss = (0 (100000) 334855)) atmeans expression(invlogit(predict(xb)))

    Adjusted predictions Number of obs = 104,821
    Model VCE : OIM

    Expression : invlogit(predict(xb))

    1._at : city = .5411797 (mean)
    base = 2.956026 (mean)
    territory = 3.146936 (mean)
    border = 11.05219 (mean)
    height = 5.028224 (mean)
    pop = 6.826342 (mean)
    ln_63 = 4.371147 (mean)
    mod1r_c = 3.105943 (mean)
    loss = 0
    dv1_lag = .0004675 (mean)

    2._at : city = .5411797 (mean)
    base = 2.956026 (mean)
    territory = 3.146936 (mean)
    border = 11.05219 (mean)
    height = 5.028224 (mean)
    pop = 6.826342 (mean)
    ln_63 = 4.371147 (mean)
    mod1r_c = 3.105943 (mean)
    loss = 100000
    dv1_lag = .0004675 (mean)

    3._at : city = .5411797 (mean)
    base = 2.956026 (mean)
    territory = 3.146936 (mean)
    border = 11.05219 (mean)
    height = 5.028224 (mean)
    pop = 6.826342 (mean)
    ln_63 = 4.371147 (mean)
    mod1r_c = 3.105943 (mean)
    loss = 200000
    dv1_lag = .0004675 (mean)

    4._at : city = .5411797 (mean)
    base = 2.956026 (mean)
    territory = 3.146936 (mean)
    border = 11.05219 (mean)
    height = 5.028224 (mean)
    pop = 6.826342 (mean)
    ln_63 = 4.371147 (mean)
    mod1r_c = 3.105943 (mean)
    loss = 300000
    dv1_lag = .0004675 (mean)

    ------------------------------------------------------------------------------
    | Delta-method
    | Margin Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _at |
    1 | .0005502 .0000833 6.61 0.000 .000387 .0007134
    2 | .0894836 .131297 0.68 0.496 -.1678539 .3468211
    3 | .946078 .1642868 5.76 0.000 .6240819 1.268074
    4 | .9996808 .0015415 648.49 0.000 .9966595 1.002702
    ------------------------------------------------------------------------------



    2.

    firthlogit dv3 city base territory border height pop ln_63 mod1r_c success dv3_lag

    initial: penalized log likelihood = -958.84283
    rescale: penalized log likelihood = -958.84283
    Iteration 0: penalized log likelihood = -958.84283 (not concave)
    Iteration 1: penalized log likelihood = -956.48163 (not concave)
    Iteration 2: penalized log likelihood = -951.33286
    Iteration 3: penalized log likelihood = -951.07602 (not concave)
    Iteration 4: penalized log likelihood = -949.00891 (not concave)
    Iteration 5: penalized log likelihood = -948.02678
    Iteration 6: penalized log likelihood = -940.91648 (not concave)
    Iteration 7: penalized log likelihood = -940.17687
    Iteration 8: penalized log likelihood = -939.84826
    Iteration 9: penalized log likelihood = -939.84762
    Iteration 10: penalized log likelihood = -939.84762

    Number of obs = 130,505
    Wald chi2(10) = 49.57
    Penalized log likelihood = -939.84762 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------
    dv3 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    city | -.1029971 .1925142 -0.54 0.593 -.4803181 .2743238
    base | -.5766928 .1560966 -3.69 0.000 -.8826365 -.270749
    territory | .1774427 .0671192 2.64 0.008 .0458914 .308994
    border | -.1745738 .079129 -2.21 0.027 -.3296638 -.0194839
    height | .0089205 .005544 1.61 0.108 -.0019456 .0197866
    pop | -.0586204 .0956289 -0.61 0.540 -.2460495 .1288087
    ln_63 | .1637436 .392236 0.42 0.676 -.6050248 .932512
    mod1r_c | .0310071 .0806966 0.38 0.701 -.1271553 .1891696
    success | .0001233 .000041 3.01 0.003 .0000429 .0002037
    dv3_lag | 1.388634 1.420496 0.98 0.328 -1.395486 4.172755
    _cons | -4.364281 2.083713 -2.09 0.036 -8.448284 -.2802783
    ------------------------------------------------------------------------------

    . margins, at (success = (0 (100000) 693654)) atmeans expression(invlogit(predict(xb)))

    Adjusted predictions Number of obs = 130,505
    Model VCE : OIM

    Expression : invlogit(predict(xb))

    1._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 0
    dv3_lag = .0011111 (mean)

    2._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 100000
    dv3_lag = .0011111 (mean)

    3._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 200000
    dv3_lag = .0011111 (mean)

    4._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 300000
    dv3_lag = .0011111 (mean)

    5._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 400000
    dv3_lag = .0011111 (mean)

    6._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 500000
    dv3_lag = .0011111 (mean)

    7._at : city = .5424543 (mean)
    base = 2.961185 (mean)
    territory = 3.146799 (mean)
    border = 11.05412 (mean)
    height = 5.074115 (mean)
    pop = 6.830019 (mean)
    ln_63 = 4.376774 (mean)
    mod1r_c = 3.107196 (mean)
    success = 600000
    dv3_lag = .0011111 (mean)

    ------------------------------------------------------------------------------
    | Delta-method
    | Margin Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    _at |
    1 | .0008755 .0000853 10.26 0.000 .0007083 .0010428
    2 | .9950002 .0204 48.77 0.000 .955017 1.034983
    3 | 1 2.20e-07 4.5e+06 0.000 .9999995 1
    4 | 1 1.73e-09 5.8e+08 0.000 1 1
    5 | 1 . . . . .
    6 | 1 . . . . .
    7 | 1 . . . . .
    ------------------------------------------------------------------------------


    Hello, I am trying to calculate predicted probabilities for 'loss' and 'success' in each model below. However, in the first case, the results indicate that as 'loss' increases, the probability of DV1 rises by approximately 181,573%, while holding all other independent variables constant at their mean values. I am uncertain whether this result is correct because the percentage increase seems unusually high.

    In the second case, the margins command fails to provide the predicted probabilities. Could anyone please advise me on what steps I should take to address these issues in both cases?

    Thank you very much for your help in advance!

    Last edited by April Kimm; 23 Dec 2024, 23:31.

  • #2
    Originally posted by April Kimm View Post
    . . . I am trying to calculate predicted probabilities for 'loss' and 'success' in each model below. However, in the first case, the results indicate that as 'loss' increases, the probability of DV1 rises by approximately 181,573% . . . I am uncertain whether this result is correct because the percentage increase seems unusually high.

    In the second case, the margins command fails to provide the predicted probabilities.
    I don't quite follow you with "the probability of DV1 rises by approximately 181,573%", but the results with both cases appear just about expected, given the precision shown and limits in the ranges of the arguments of the called functions.
    Code:
    version 18.0
    
    clear *
    
    *
    * First case
    *
    
    local bcity .9560908
    local bbase -.2890522
    local bterritory .1109483
    local bborder .3606393
    local bheight -.0388977
    local bpop .1534958
    local bln_63 -.9431281
    local bmod1r_c -.1244403
    local bloss .0000518
    local bdv1_lag 2.746286
    local b_cons -7.84713
    
    local city .5411797
    local base 2.956026
    local territory 3.146936
    local border 11.05219
    local height 5.028224
    local pop 6.826342
    local ln_63 4.371147
    local mod1r_c 3.105943
    local dv1_lag .0004675
    
    scalar define xb = `b_cons'
    foreach var in city base territory border height pop ln_63 mod1r_c dv1_lag {
            scalar define xb = xb + ``var'' * `b`var''
    }
    
    forvalues loss = 0(100000)334855 {
        scalar define proportion = invlogit(xb + `loss' * `bloss')
        display in smcl as text "loss = `loss', proportion = " proportion
    }
    
    *
    * Second case
    *
    
    local bcity -.1029971
    local bbase -.5766928
    local bterritory .1774427
    local bborder -.1745738
    local bheight .0089205
    local bpop -.0586204
    local bln_63 .1637436
    local bmod1r_c .0310071
    local bsuccess .0001233
    local bdv3_lag 1.388634
    local b_cons -4.364281
    
    local city .5424543
    local base 2.961185
    local territory 3.146799
    local border 11.05412
    local height 5.074115
    local pop 6.830019
    local ln_63 4.376774
    local mod1r_c 3.107196
    local dv3_lag .0011111
    
    scalar define xb = `b_cons'
    foreach var in city base territory border height pop ln_63 mod1r_c dv1_lag {
            scalar define xb = xb + ``var'' * `b`var''
    }
    
    forvalues success = 0(100000)693654 {
        scalar define xb_success = xb + `success' * `bsuccess'
        scalar define proportion = invlogit(xb_success)
        display in smcl as text "success = `success', linear prediction = " xb_success ///
            ", proportion = " proportion
    }
    
    exit
    Originally posted by April Kimm View Post
    Could anyone please advise me on what steps I should take to address these issues in both cases?
    It's been mentioned before on the list a few times, e.g., here, here and as recently as here, that the user-written command firthlogit really isn't intended to be used with margins. Moreover, you seem to have a lagged outcome variable as a predictor, and I'm not sure that the assumptions underlying firthlogit are compatible with its usage in such circumstances.

    My recommendation is that if you're interested in using margins, then try to avoid firthlogit, and if you're in circumstances where you must resort to firthlogit, then don't go into it expecting to be able to cleanly use margins afterward.

    You have well over one hundred thousand observations in both cases. Is there some reason why you cannot use logit or some other more suitable official Stata estimation command to fit your model?

    Comment


    • #3
      Thank you for the answer. My data suffers from separation issues, and the model only converges when I use Firthlogit. I used the margins command as described here: https://www3.nd.edu/~rwilliam/stats3/rareevents.pdf.

      If Firthlogit is not intended to work with the margins command, is there an alternative method for predicting probabilities? Additionally, I am unsure about what your commands indicate. Any guidance would be greatly appreciated. Thank you!
      Last edited by April Kimm; 25 Dec 2024, 01:20.

      Comment


      • #4
        Originally posted by April Kimm View Post
        My data suffers from separation issues, and the model only converges when I use Firthlogit.
        With 104,821 and 130,505 observations I would not expect separation to result from sampling happenstance. I would instead suspect model misspecification as the culprit. I would be on the lookout for coding errors, the presence of structural features of the dataset that I'm not aware of and haven't accommodated, and the possibility that there's something about the data-generating process that the modeling approach doesn't take into account. Perhaps you've already done all that.

        . . . I am unsure about what your commands indicate.
        The predictions rapidly reach probability one before the upper end of the range of values for loss and success—the latter almost immediately. If credible, then it seems that there's not a whole lot of variation in the outcomes throughout most of the range of the predictors to study. Do you see this in the data?

        Comment


        • #5
          Apologies for the delayed response. Yes, the dependent variables represent rare events with limited variation.

          When the predictions quickly converge to a probability of one, what is the best way to appropriately report the probabilities derived from the model in this situation?
          Last edited by April Kimm; 05 Jan 2025, 02:00.

          Comment

          Working...
          X