Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there any way in Stata to determine relative importance (explained variation) of variables in nonlinear models.

    Dear all,

    I have two independent variables which are correlated. I want to explain which of these variables explains more variation in y, where I am estimating y by means of fracreg. In addition, OLS does not seem to be a reasonable approximation of the fracreg.

    I'm not looking for a perfect solution, just the best I can do.

    To make it slightly more complicated, one of my independent variables is ordinal.

    Any comments, commands or references would be greatly appreciated!

  • #2
    you might want to check out the community-contributed -domin- command; use -search- or -findit- to find, download and install

    added in edit: actually, not sure that -fitstat- will recognize -fracreg- (domin refers to -fitstat-)

    Comment


    • #3
      Rich Goldstein

      Thank you very much for your suggestion!

      I will check it out. I would already settle for a glm (the estimates there are much closer).

      I assume that it accepts that right?

      Comment


      • #4
        For future users, stumbling on this question. Here is an overview of what the domin package recommended by Rich Goldstein can do, with examples : https://github.com/jluchman/domin.

        It does include the glm command. But indeed no fracreg

        Code:
        ssc install domin
        Last edited by Tom Kisters; 04 May 2021, 01:24.

        Comment


        • #5
          Hi Tom,

          domin can work with fracreg. A reproducible example of just such an analysis is given below.

          Code:
          . sysuse nlsw88
          (NLSW, 1988 extract)
          
          . collapse (mean) smsa, by(age married collgrad)
          
          . fracreg logit smsa age married collgrad
          
          Iteration 0:   log pseudolikelihood = -29.314705  
          Iteration 1:   log pseudolikelihood = -28.827056  
          Iteration 2:   log pseudolikelihood = -28.826122  
          Iteration 3:   log pseudolikelihood = -28.826122  
          
          Fractional logistic regression                  Number of obs     =         50
                                                          Wald chi2(3)      =       2.76
                                                          Prob > chi2       =     0.4308
          Log pseudolikelihood = -28.826122               Pseudo R2         =     0.0045
          
          ------------------------------------------------------------------------------
                       |               Robust
                  smsa |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |  -.0212472   .0437547    -0.49   0.627    -.1070049    .0645106
               married |    -.10789   .1940918    -0.56   0.578     -.488303    .2725229
              collgrad |   .2747233   .2201005     1.25   0.212    -.1566657    .7061123
                 _cons |    1.78102   1.669919     1.07   0.286     -1.49196    5.054001
          ------------------------------------------------------------------------------
          
          . bro
          
          . domin smsa age married collgrad, reg(fracreg logit) fitstat(e(r2_p))
          
          Total of 7 regressions
          
          General dominance statistics: Fractional logistic regression
          Number of obs             =                      50
          Overall Fit Statistic     =                  0.0045
          
                      |      Dominance      Standardized      Ranking
           smsa       |      Stat.          Domin. Stat.
          ------------+------------------------------------------------------------------------
           age        |         0.0009      0.2062            2 
           married    |         0.0004      0.0968            3 
           collgrad   |         0.0032      0.6970            1 
          -------------------------------------------------------------------------------------
          Conditional dominance statistics
          -------------------------------------------------------------------------------------
          
                     #indepvars:  #indepvars:  #indepvars:
                              1            2            3
               age       0.0009       0.0009       0.0010
           married       0.0004       0.0004       0.0005
          collgrad       0.0032       0.0032       0.0032
          -------------------------------------------------------------------------------------
          Complete dominance designation
          -------------------------------------------------------------------------------------
          
                                dominated?:  dominated?:  dominated?:
                                       age      married     collgrad
               dominates?:age            0            1           -1
           dominates?:married           -1            0           -1
          dominates?:collgrad            1            1            0
          -------------------------------------------------------------------------------------
          
          Strongest dominance designations
          
          collgrad completely dominates age
          age completely dominates married
          collgrad completely dominates married
          As the helpfile, and the GitHub page, note, domin can be used for any model that has a depvar indepvars structure (like fracreg). In this case, fracreg also returns a fit statistic (i.e., e(r2_p)) that domin can use directly.

          glm also does work, but it does not return as useful of fit statistics (in my view) compared to more specific/focused implementations of GLMs like logit, probit, poisson, nbreg, etc.

          - joe
          Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
          ----
          Research Fellow
          Fors Marsh

          ----
          Version 18.0 MP

          Comment


          • #6
            Joseph Luchman

            Thank you very much for your elaborate response Joseph. That's really helpful! I actually posted an "issue" on your github today, which was more a question (and one more suitable to be asked here) , so I will repeat it.

            I found plenty of literature on relative importance in linear models (which you also refer to Gröndling 2007 I believe) , but I did not find anything pertaining to non linear models. Is there any literature defending the application of domin in relation to (for example) fragreg? Or is there an intuitive reason that this is no issue?

            Last edited by Tom Kisters; 04 May 2021, 10:04.

            Comment


            • #7
              Hi Tom,

              There is published research that examine the use of dominance analysis in the context of logit, (Azen & Traxel, 2009) ologit, and mlogit (Luchman, 2014) generalized linear models in particular.

              I, personally, see domin as a post-estimation command akin to margins. It is a re-organization of information about a model that supports the interpretation of the base model. So long as it is provided something sensible (i.e., a useful fit metric describing the model; more extensive discussion is available in Azen & Traxel, 2009) it will provide useful results for any model (even machine learning ones; indeed the dominance analysis method, in a slightly different form, underlies the SHAP approach for "interpretable machine learning").

              Thus, if the model used is a trusted implementation of the intended predictive approach and the fit metric is a trusted representation of predictive value in that model, the methodology underlying domin should produce a result that is in-line with the precepts in linear models insofar as it is a decomposition of the fit metric into shares that can be used to infer importance. Recommend looking more at Shapley value decomposition for more a technical treatment of the basic methodology though as that is its conceptual groundwork. Dominance analysis is a specific implementation/sub-field focused on importance.

              - joe
              Last edited by Joseph Luchman; 04 May 2021, 11:26. Reason: typos
              Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
              ----
              Research Fellow
              Fors Marsh

              ----
              Version 18.0 MP

              Comment


              • #8
                Thank you very much for all your help! This is great!

                Comment

                Working...
                X