ROC curve after xtlogit

Rodrigo Primor

Join Date: Jan 2018

Posts: 27
#1

ROC curve after xtlogit

17 Jan 2018, 08:28

Hi,

I would like to know if it is possible to obtain a ROC curve after xtlogit, because when i try to do it, the following error appears:

Code:

. lroc last estimates not found r(301);

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29201
#2

17 Jan 2018, 09:11

-lroc- is written to run only after -logit-, -logistic-, or -probit-, not -xtlogit-. You can still trick Stata into doing an ROC curve by running -predict xb- after -xtlogit- and then applying the -roctab- command.

But be careful. When Stata has a command that only works after certain kinds of estimation, there is usually a good reason for that. You should be cautious in tricking or forcing Stata to get around that limitation. That is true here. This approach ignores the fixed or random effects that were part of the panel logistic regression model. So it isn't clear what an ROC curve calculated in this way actually means. Worse still, if you ran a fixed effects logit, then you have to remember that you are not modeling the probability of the outcome. You are modeling the probability that, of all the observations on this particular panel unit, it will be these particular ones that have a non-zero outcome. I have trouble even conceptualizing what an ROC curve means in that context.
1 like
Comment
Rodrigo Primor

Join Date: Jan 2018

Posts: 27
#3

17 Jan 2018, 09:24

Thanks Clyde, i need the ROC curve to find the optimal cut-off point in my regression with xtlogit. But even if could do that, when a i run a regression (xtlogit) with some particular group of independent variables, there is always the message "backed up" after some iterations. I guess i could specify the number of iterations (iterate()) but when i do that, stata simply ignores and continues.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29201
#4

17 Jan 2018, 09:36

when a i run a regression (xtlogit) with some particular group of independent variables, there is always the message "backed up" after some iterations.

This is not necessarily a problem. If the log likelihood is continuing to change (even only minimally every several iterations) it is possible the model will eventually converge. If it says "backed up" on the very last iteration, then the estimation has failed. But having "backed up" along the way is not necessarily a problem.

I guess i could specify the number of iterations (iterate()) but when i do that, stata simply ignores and continues.

The -iterate()- option causes Stata to stop the maximization process after the specified number of iterations. That enables you to see what is going on along the way. It is typically used when an estimation has failed to converge and you want to see which coefficients or standard errors seem to be causing the problem. But, in any case, you should never use the results of an estimation that was terminated by the -iterate()- option. Even if the estimation is going along without difficulties to that point, the results you are looking at reflect a premature termination of the maximization and they are not the estimates that maximize the likelihood.

If you are having trouble getting your -xtlogit- to converge, I suggest identifying at how many iterations it gets stuck. Then restart it using the -iterate()- operation with a number of iterations that just gets you a little beyond where it gets stuck. Examine those outputs. Often you will find some coefficient that is unreasonably large, or a standard error that is unreasonably large or unreasonably close to zero. Those statistics indicate the variable(s) that is likely making convergence difficult. Then try re-running the model without that variable(s). If this direct approach doesn't work, a trial-and-error approach of just starting with a simple model having one predictor, and then adding more predictors one at a time until convergence fails will help you identify the source of the difficulty.
Comment
Rodrigo Primor

Join Date: Jan 2018

Posts: 27
#5

17 Jan 2018, 09:48

The problem is that i tried to do a trial-and-error approach and i found out that even some variables cause this error when they are the only ones in the model. But i will try to do it once again, taking close attention to standard errors.
I guess that using logit instead of xtlogit my type of data is a big mistake, right? Some convergence problems seems to disappear.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29201
#6

17 Jan 2018, 10:06

-logit- and -xtlogit- are not simply more or less convenient alternatives. They are different models. There are very few situations where you have a choice between them. Usually only one or the other is valid for the kind of data you are working with.

It is true that -logit- models seldom exhibit convergence problems, whereas convergence problems with -xtlogit- aren't all that unusual. But you can't just substitute -logit- for -xtlogit- to escape those problems.

If a variable causes convergence problems even when it is the only variable in the model, it's a pretty safe bet that you simply cannot use that variable in your model. (It is occasionally possible that a variable that leads to a non-convergent estimation when used alone will nevertheless participate nicely in a convergent model with other variables. But this seldom happens and you are probably best off assuming you won't be that lucky.) So confine your model building to working with variables that at least allow convergence when used alone. Once you have built the best model you can with those variables, you are probably done. If some of the variables you had to leave out seem really important, you can try adding them, one at a time, to the model you arrived at--but you would have to be really lucky for those extensions of the model to converge. More likely you will just have to abandon those variables.

Last edited by Clyde Schechter; 17 Jan 2018, 10:09.
Comment
Rodrigo Primor

Join Date: Jan 2018

Posts: 27
#7

17 Jan 2018, 11:08

Right, i will remove those variables and see those variables. I was asking that difference between those two models because in the papers that i have read and use data identical to mine, ROC analysis is always present.

Thanks Clyde!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#8

18 Jan 2018, 05:25

You've not shown us your commands, as FAQ 12 requests, but are you trying xtlogit, fe on the data set you work on in this thread? If so, every panel with no failures will drop out of the analysis. I don't know if that is causing your difficulties, but it does mean that an ROC curve would not distinguish panels that failed from those that didn't.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Rodrigo Primor

Join Date: Jan 2018
Posts: 27

18 Jan 2018, 06:40

Yes Steve, i was trying to that but now i am using xtlogit, re as the hausman test indicates. I just don't know how to obtain a confusion matrix from this and just like Clyde said, ROC curve does not make much sense in this case, right?

The code from one of the estimated models is below:

Code:

xtlogit default V2 V7 V16 V18

Random-effects logistic regression              Number of obs     =     56,855
Group variable: id                              Number of groups  =     12,239

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        4.6
                                                              max =          5

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(4)      =     109.07
Log likelihood  = -3123.0286                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     default |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          V2 |  -.3937993   .0596425    -6.60   0.000    -.5106965    -.276902
          V7 |   1.061673   .1500214     7.08   0.000      .767636    1.355709
         V16 |   -.428911   .1091569    -3.93   0.000    -.6428546   -.2149674
         V18 |   .5347579    .150536     3.55   0.000     .2397128    .8298031
       _cons |  -5.915267   .4082821   -14.49   0.000    -6.715485   -5.115049
-------------+----------------------------------------------------------------
    /lnsig2u |    .767155   .4417714                      -.098701    1.633011
-------------+----------------------------------------------------------------
     sigma_u |   1.467525   .3241553                      .9518474    2.262579
         rho |   .3956335   .1056309                      .2159292    .6087744
------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 3.41                   Prob >= chibar2 = 0.032

Thanks

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#10

18 Jan 2018, 07:45

Thanks for showing code and results. I agree that ROC analysis doesn't make much sense in your case, whether for re, fe, or pa.

To get ROCs, you could run an ordinary logit model , with firm as the unit, with event being failure at period on study, and with covariates at baseline.

You could also do logit followed by ROC at every period among those still at risk for failure, again with company as the unit and covariates known at the start of the period.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Rodrigo Primor

Join Date: Jan 2018
Posts: 27

#11

18 Jan 2018, 08:28

Ok, i will follow your suggestion but when i add the macroeconomic variables to the model ("m" variables), LR test of rho=0 becomes non significant. These variables vary within a period of 6 years, but they are allways repeating for all the companies within a country.
This rho is suggesting that i should follow logit instead of xtlogit? Because there is no variation in the panel, right?

Code:

Random-effects logistic regression              Number of obs     =     41,069
Group variable: id                              Number of groups  =      9,602

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        4.3
                                                              max =          5

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(9)      =     268.34
Log likelihood  = -2457.0008                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     default |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          V2 |   -.436586   .0779906    -5.60   0.000    -.5894447   -.2837272
          V7 |   1.001503   .1592026     6.29   0.000     .6894719    1.313535
          V8 |   .0755773   .0414478     1.82   0.068    -.0056588    .1568135
         V16 |  -.5531951   .1163924    -4.75   0.000      -.78132   -.3250701
         V18 |   .4224832   .1674425     2.52   0.012      .094302    .7506645
          m3 |   .1402127   .0394842     3.55   0.000      .062825    .2176004
          m5 |   .0165315   .0256787     0.64   0.520    -.0337979    .0668609
          m6 |   .0772603   .0087842     8.80   0.000     .0600436     .094477
          m8 |  -.0135515   .0019882    -6.82   0.000    -.0174483   -.0096547
       _cons |  -27.79891   5.403935    -5.14   0.000    -38.39043    -17.2074
-------------+----------------------------------------------------------------
    /lnsig2u |  -10.09615   15.49418                     -40.46418    20.27187
-------------+----------------------------------------------------------------
     sigma_u |   .0064217   .0497492                      1.63e-09    25233.71
         rho |   .0000125   .0001942                      8.12e-19           1
------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 2.8e-04                Prob >= chibar2 = 0.493

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

18 Jan 2018, 10:37

I suggested logit so that you could get an ROC analysis.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement