Is there any way in Stata to determine relative importance (explained variation) of variables in nonlinear models.

Tom Kisters

Join Date: May 2020

Posts: 48
#1

Is there any way in Stata to determine relative importance (explained variation) of variables in nonlinear models.

03 May 2021, 12:18

Dear all,

I have two independent variables which are correlated. I want to explain which of these variables explains more variation in y, where I am estimating y by means of fracreg. In addition, OLS does not seem to be a reasonable approximation of the fracreg.

I'm not looking for a perfect solution, just the best I can do.

To make it slightly more complicated, one of my independent variables is ordinal.

Any comments, commands or references would be greatly appreciated!
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

03 May 2021, 12:45

you might want to check out the community-contributed -domin- command; use -search- or -findit- to find, download and install

added in edit: actually, not sure that -fitstat- will recognize -fracreg- (domin refers to -fitstat-)
Comment
Tom Kisters

Join Date: May 2020

Posts: 48
#3

03 May 2021, 13:09

Rich Goldstein

Thank you very much for your suggestion!

I will check it out. I would already settle for a glm (the estimates there are much closer).

I assume that it accepts that right?
Comment
Tom Kisters

Join Date: May 2020

Posts: 48
#4

04 May 2021, 01:15

For future users, stumbling on this question. Here is an overview of what the domin package recommended by Rich Goldstein can do, with examples : https://github.com/jluchman/domin.

It does include the glm command. But indeed no fracreg

Code:

ssc install domin

Last edited by Tom Kisters; 04 May 2021, 01:24.
Comment

Joseph Luchman

Join Date: Mar 2014
Posts: 114

04 May 2021, 07:09

Hi Tom,

domin can work with fracreg. A reproducible example of just such an analysis is given below.

Code:

. sysuse nlsw88
(NLSW, 1988 extract)

. collapse (mean) smsa, by(age married collgrad)

. fracreg logit smsa age married collgrad

Iteration 0:   log pseudolikelihood = -29.314705  
Iteration 1:   log pseudolikelihood = -28.827056  
Iteration 2:   log pseudolikelihood = -28.826122  
Iteration 3:   log pseudolikelihood = -28.826122  

Fractional logistic regression                  Number of obs     =         50
                                                Wald chi2(3)      =       2.76
                                                Prob > chi2       =     0.4308
Log pseudolikelihood = -28.826122               Pseudo R2         =     0.0045

------------------------------------------------------------------------------
             |               Robust
        smsa |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0212472   .0437547    -0.49   0.627    -.1070049    .0645106
     married |    -.10789   .1940918    -0.56   0.578     -.488303    .2725229
    collgrad |   .2747233   .2201005     1.25   0.212    -.1566657    .7061123
       _cons |    1.78102   1.669919     1.07   0.286     -1.49196    5.054001
------------------------------------------------------------------------------

. bro

. domin smsa age married collgrad, reg(fracreg logit) fitstat(e(r2_p))

Total of 7 regressions

General dominance statistics: Fractional logistic regression
Number of obs             =                      50
Overall Fit Statistic     =                  0.0045

            |      Dominance      Standardized      Ranking
 smsa       |      Stat.          Domin. Stat.
------------+------------------------------------------------------------------------
 age        |         0.0009      0.2062            2 
 married    |         0.0004      0.0968            3 
 collgrad   |         0.0032      0.6970            1 
-------------------------------------------------------------------------------------
Conditional dominance statistics
-------------------------------------------------------------------------------------

           #indepvars:  #indepvars:  #indepvars:
                    1            2            3
     age       0.0009       0.0009       0.0010
 married       0.0004       0.0004       0.0005
collgrad       0.0032       0.0032       0.0032
-------------------------------------------------------------------------------------
Complete dominance designation
-------------------------------------------------------------------------------------

                      dominated?:  dominated?:  dominated?:
                             age      married     collgrad
     dominates?:age            0            1           -1
 dominates?:married           -1            0           -1
dominates?:collgrad            1            1            0
-------------------------------------------------------------------------------------

Strongest dominance designations

collgrad completely dominates age
age completely dominates married
collgrad completely dominates married

As the helpfile, and the GitHub page, note, domin can be used for any model that has a depvar indepvars structure (like fracreg). In this case, fracreg also returns a fit statistic (i.e., e(r2_p)) that domin can use directly.

glm also does work, but it does not return as useful of fit statistics (in my view) compared to more specific/focused implementations of GLMs like logit, probit, poisson, nbreg, etc.

- joe

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP

Comment

Tom Kisters

Join Date: May 2020

Posts: 48
#6

04 May 2021, 09:59

Joseph Luchman

Thank you very much for your elaborate response Joseph. That's really helpful! I actually posted an "issue" on your github today, which was more a question (and one more suitable to be asked here) , so I will repeat it.

I found plenty of literature on relative importance in linear models (which you also refer to Gröndling 2007 I believe) , but I did not find anything pertaining to non linear models. Is there any literature defending the application of domin in relation to (for example) fragreg? Or is there an intuitive reason that this is no issue?

Last edited by Tom Kisters; 04 May 2021, 10:04.
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#7

04 May 2021, 11:24

Hi Tom,

There is published research that examine the use of dominance analysis in the context of logit, (Azen & Traxel, 2009) ologit, and mlogit (Luchman, 2014) generalized linear models in particular.

I, personally, see domin as a post-estimation command akin to margins. It is a re-organization of information about a model that supports the interpretation of the base model. So long as it is provided something sensible (i.e., a useful fit metric describing the model; more extensive discussion is available in Azen & Traxel, 2009) it will provide useful results for any model (even machine learning ones; indeed the dominance analysis method, in a slightly different form, underlies the SHAP approach for "interpretable machine learning").

Thus, if the model used is a trusted implementation of the intended predictive approach and the fit metric is a trusted representation of predictive value in that model, the methodology underlying domin should produce a result that is in-line with the precepts in linear models insofar as it is a decomposition of the fit metric into shares that can be used to infer importance. Recommend looking more at Shapley value decomposition for more a technical treatment of the basic methodology though as that is its conceptual groundwork. Dominance analysis is a specific implementation/sub-field focused on importance.

- joe

Last edited by Joseph Luchman; 04 May 2021, 11:26. Reason: typos

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
Comment
Tom Kisters

Join Date: May 2020

Posts: 48
#8

05 May 2021, 01:32

Thank you very much for all your help! This is great!
Comment

Announcement