Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pseudo R squared for a Population-averaged Model

    Hi Statalisters,

    I am a novice user in Stata. I'm working with Stata.14 and Windows 7.

    I'm working on a Panel Data Set for all commerical banks in the U.S. for the period 1995 - 2018 (time variable). So I have data on a bank-year level. I created the ID Variable with the variables bank name and cert I already calculated four bank risk proxies: Z-Score, NPA (non-performing assets), LLP (loan loss provisions) and LLR (loan loss reserves) on a bank-year level.
    I calculated the Risk Proxy Z_score and I would like to run the binary probability model explaining the occurrence of a bank failure ( Failure = 1, Active = 0) with the risk proxy (lagged by one year). My "Guiding Paper" reports marginal effects of probit regressions.

    This is the dataset with 172 431 observations and the Variable "status" if a banke failed = 1 or if it is still active = 0

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(year id status Z_score)
    1995   7 1 -1.4038005
    1995  10 0  -1.434213
    1995  11 0 -1.5771302
    1995  14 0  -1.758422
    1995  16 0 -1.4295077
    1995  21 0  -1.329172
    1995  27 0 -1.3730284
    1995  32 0 -1.3627455
    1995  34 0  -1.908463
    1995  38 0 -1.8048723
    1995  41 0 -1.5905398
    1995  46 0  -1.533159
    1995  47 0  -1.663955
    1995  48 0  -1.518417
    1995  52 0 -1.3701818
    1995  53 0  -1.485621
    1995  56 0   -1.63249
    1995  59 0  -1.476241
    1995  76 0 -1.3577138
    1995  82 0 -1.3661845
    1995  84 0 -1.3885205
    1995  85 0 -1.5949416
    1995  87 0 -2.0597448
    1995  99 0 -2.2821965
    1995 101 0 -1.5937258
    1995 104 0 -1.6237373
    end
    Now I have to figure out the correct probability model:

    - As I use data for all U.S. commercial banks between 1995 and 2018 I have Panel Data, so I have to use xtprobit
    - As my model is Pr(Failure = 1) = F(it-1 ß) and my "Guiding Paper" is saying "we report marginal effects of probit regressions" I think I have to use the Population-averaged (PA) model instead of the Random-effects (RE) model. After the literature review I came to the conclusion, that I would like to have a model which should explain the occurence of a bank failure of the entire "banking population" and not of a specific bank (e.g. Deutsche Bank). [Please correct my if I am wrong with the Population-averaged model].

    So I was running the command with the correct correction of the standard error with vce(robust):

    Code:
    xtset id year
           panel variable:  id (unbalanced)
            time variable:  year, 1995 to 2018, but with gaps
                    delta:  1 unit
    Code:
    xtprobit status L.Z_score, pa  vce(robust)
    And the Output:
    Code:
    Iteration 1: tolerance = .86103329
    Iteration 2: tolerance = .503125
    Iteration 3: tolerance = .29113738
    Iteration 4: tolerance = .22620098
    Iteration 5: tolerance = .14532103
    Iteration 6: tolerance = .10679244
    Iteration 7: tolerance = .06929776
    Iteration 8: tolerance = .05079473
    Iteration 9: tolerance = .03304497
    Iteration 10: tolerance = .02339874
    Iteration 11: tolerance = .01565962
    Iteration 12: tolerance = .01090075
    Iteration 13: tolerance = .0073888
    Iteration 14: tolerance = .00510128
    Iteration 15: tolerance = .00347811
    Iteration 16: tolerance = .00239194
    Iteration 17: tolerance = .00163531
    Iteration 18: tolerance = .00112255
    Iteration 19: tolerance = .00076844
    Iteration 20: tolerance = .00052703
    Iteration 21: tolerance = .000361
    Iteration 22: tolerance = .00024749
    Iteration 23: tolerance = .00016957
    Iteration 24: tolerance = .00011623
    Iteration 25: tolerance = .00007964
    Iteration 26: tolerance = .00005459
    Iteration 27: tolerance = .00003741
    Iteration 28: tolerance = .00002564
    Iteration 29: tolerance = .00001757
    Iteration 30: tolerance = .00001204
    Iteration 31: tolerance = 8.251e-06
    Iteration 32: tolerance = 5.655e-06
    Iteration 33: tolerance = 3.875e-06
    Iteration 34: tolerance = 2.656e-06
    Iteration 35: tolerance = 1.820e-06
    Iteration 36: tolerance = 1.247e-06
    Iteration 37: tolerance = 8.548e-07
    
    GEE population-averaged model                   Number of obs     =    156,147
    Group variable:                         id      Number of groups  =     14,692
    Link:                               probit      Obs per group:
    Family:                           binomial                    min =          1
    Correlation:                  exchangeable                    avg =       10.6
                                                                  max =         23
                                                    Wald chi2(1)      =      24.18
    Scale parameter:                         1      Prob > chi2       =     0.0000
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |             Semirobust
          status |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         Z_score |
             L1. |   .0335066   .0068144     4.92   0.000     .0201506    .0468626
                 |
           _cons |  -1.658208   .0214663   -77.25   0.000    -1.700282   -1.616135
    ------------------------------------------------------------------------------
    Seems this correct for you?

    In my "Guiding Paper" they assess the Z-Score Model on its Pseudo R2. I know that the "normal" probit regeression Output shows me the Pseudo R2 and there is a way to calculate the Pseudo R2 for the xtprobit Panel Data Probit Regression. I know that the pseudo R2 is stored with e(r2 p).
    How can I calculate the Pseudo R Squared for the Population Averaged Model?

    Thank you very much for your support!
    Last edited by Katharina Maier; 03 Jul 2019, 08:43.

  • #2
    The default Pseudo R-squared (i.e. McFadden's Pseudo R-squared) reported by Stata commands is based on maximum likelihood estimation whereas generalized estimating equations (GEE) is a nonlikelihood based estimation method.

    Comment

    Working...
    X