Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shapley Value for Dummy Variable

    Hi, everyone,

    When I do a logistic regression model with dummy variables, I get the following message:

    . shapley2,stat(r2_p)
    Factor variables are not supported
    Please create the variables manually


    I have been spending much time on it but still do not know what to do. If you know how to handle it, please give me a helping hand.

    Thanks a lot!

    Best,

    Tony





  • #2
    you don't show us your estimation command but presumably it includes factor variable notation and the help file explicitly says, "Factor variables (fvvarlist) such as i.var are currently not (yet) supported." - so you need to re-estimate your model without using factor variable notation; if your "dummy variables" are each coded 0/1, just remove the "I." and all should be fine

    Comment


    • #3
      Dear Goldstein,

      I am sorry about my carelessness! I am new to STATA.

      I will try to consider your suggestion if my program can work. If you are willing, I would like to seek your advice in the future when necessary.

      Many thanks!

      Tony

      Comment


      • #4
        Dear Goldstein,

        Attached please find the stata output and the data file.

        The Shapley values for the 2 variables, gender and degree, are both equal to zero. I don't know why.

        Please advise me if you have any idea. Thank you.

        Tony
        Attached Files

        Comment


        • #5
          The problem appears to be that the shapley2 command does not work when the estimates were provided by a weighted regression, even when the weights are simple frequency weights. Fortunately that can be worked around by expanding the data into the full number of observations.
          Code:
          . import excel "~/Downloads/test_dat2.xlsx", sheet("Sheet1") firstrow
          (4 vars, 8 obs)
          
          . list, clean
          
                 gender   degree   effect   count  
            1.        0        0        1      21  
            2.        0        0        0       6  
            3.        0        1        1       9  
            4.        0        1        0       9  
            5.        1        0        1       8  
            6.        1        0        0      10  
            7.        1        1        1       4  
            8.        1        1        0      11  
          
          . logistic effect gender degree [fweight = count]
          
          Logistic regression                                     Number of obs =     78
                                                                  LR chi2(2)    =  11.77
                                                                  Prob > chi2   = 0.0028
          Log likelihood = -47.949799                             Pseudo R2     = 0.1093
          
          ------------------------------------------------------------------------------
                effect | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                gender |   .2788851   .1388838    -2.56   0.010     .1050824    .7401519
                degree |   .3483667   .1734728    -2.12   0.034      .131272    .9244879
                 _cons |   3.179667   1.283162     2.87   0.004     1.441708    7.012713
          ------------------------------------------------------------------------------
          Note: _cons estimates baseline odds.
          
          . shapley2, stat(r2_p)
          Factor     | Shapley value |  Per cent 
                     |  (estimate)   | (estimate)
          -----------+---------------+-----------+
          gender     |  0.00000      |    0.00 % |
          degree     |  0.00000      |    0.00 % |
          -----------+---------------+-----------+
          TOTAL      |  0.10931      |  100.00 % |
          -----------+---------------+-----------+
          
          . expand count
          (70 observations created)
          
          . logistic effect gender degree
          
          Logistic regression                                     Number of obs =     78
                                                                  LR chi2(2)    =  11.77
                                                                  Prob > chi2   = 0.0028
          Log likelihood = -47.949799                             Pseudo R2     = 0.1093
          
          ------------------------------------------------------------------------------
                effect | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                gender |   .2788851   .1388838    -2.56   0.010     .1050824    .7401519
                degree |   .3483667   .1734728    -2.12   0.034      .131272    .9244879
                 _cons |   3.179667   1.283162     2.87   0.004     1.441708    7.012713
          ------------------------------------------------------------------------------
          Note: _cons estimates baseline odds.
          
          . shapley2, stat(r2_p)
          Factor     | Shapley value |  Per cent 
                     |  (estimate)   | (estimate)
          -----------+---------------+-----------+
          gender     |  0.06523      |   59.67 % |
          degree     |  0.04408      |   40.33 % |
          -----------+---------------+-----------+
          TOTAL      |  0.10931      |  100.00 % |
          -----------+---------------+-----------+
          
          .
          Before your next post, please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

          The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

          In particular, the FAQ will tell you that presenting data and output with Excel and Word is not considered helpful, because a substantial number of members are not interested in opening files from sources they do not know and trust that are capable of containing and distributing malware.

          Comment


          • #6
            I have been traveling but am glad to see that William Lisowski has provided an answer; re: your #3, note that I will not download binary files from people I don't know so the attachments there will never be looked at by me - please read the FAQ and follow its advice

            Comment


            • #7
              Dear William,

              Many thanks for prompt assistance! It's really helpful for me.


              Dear Goldstein,

              Thanks a lot for your reminder about the FAQ.

              Best,

              Tony

              Comment

              Working...
              X