Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal effects for categorical variables with three possible values

    Dear community,

    I am puzzled about which specification of marginal effects to use when the independent variable is categorical and has three possible values.


    I am doing an analysis with survey data where I want to measure the effect of subjective realization of past economic events (in this case, the availability of bank loans) of firms on their expectation/ forecast for the future. I am using an ordered logit model, as the dependent variable expectation takes on the values -1 (situation will deteriorate), 0 (...remain stable), or 1 (...improve). The independent variable, realistation, also can take on three different values -1 (situation has deteriorated), 0 (...remained stable), or 1 (...improved). I also include firm controls such as size, sector and country. Because factor variables do not allow negative values, I recode my independent variable.

    Code:
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    exp_bankloan |     98,802     .011771    .6245559         -1          1
    real_bankl~n |     90,806   -.0258683     .622418         -1          1
    real_bankl~e |     90,806    1.974132     .622418          1          3


    As interpreting the ologit coefficients is not straight forward, I consider using marginal effects. However, I am not sure which specification of margins to use, given that my independent variable is neither continuous nor binary (0 or 1).

    Basically, I try three different specifications, but I cannot really figure out which one is the right one to use, or, what is the difference between them.

    1) First, I use the factor specification for the independent variable:
    Code:
    // 1 with factor specification i.
    
     qui ologit exp_bankloan i.real_bankloan_recode i.size i.sector, vce(cluster permid)
     margins , dydx(i.real_bankloan_recode) predict(outcome(-1))  atmeans post
    
    --------------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
    real_bankloan_recode |
                      2  |  -.2923866   .0048704   -60.03   0.000    -.3019324   -.2828408
                      3  |  -.3933965   .0047711   -82.45   0.000    -.4027478   -.3840453
    --------------------------------------------------------------------------------------
    Here, the interpretation is clear: If the realized event of real_bankloan is "remained stable" (=0), the firm's probability of expecting a deterioration is 29,2 percentage points less than if it realized real_bankloan to be deteriorated

    2) However,technically one could also go and not define real_bankloan as a factor variable:
    Code:
    // 2 without factor specification i.:
     qui ologit exp_bankloan real_bankloan_recode i.size i.sector, vce(cluster permid)
     margins real_bankloan_recode, predict(outcome(-1))  atmeans post // -> does not work
    
    .factor 'real_bankloan_recode' not found in list of covariates
    // this does not work.
    
     qui ologit exp_bankloan real_bankloan_recode i.size i.sector, vce(cluster permid)
     margins , dydx(real_bankloan_recode) predict(outcome(-1))  atmeans post
    
    
    --------------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
    real_bankloan_recode |  -.1848426   .0020565   -89.88   0.000    -.1888733    -.180812
    --------------------------------------------------------------------------------------

    But I am puzzled about how to interpret this finding, since, in the first place, getting the margins is not possible for this specification of the variable.



    3) One could calculate the margins at certain values of the independent variable

    Code:
     // or:
     qui ologit exp_bankloan real_bankloan_recode i.size i.sector, vce(cluster permid)
     margins , at(real_bankloan_recode=(1 2 3)) predict(outcome(-1))  atmeans post
    
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             _at |
              1  |   .4230018   .0037166   113.81   0.000     .4157173    .4302862
              2  |    .157907   .0013481   117.13   0.000     .1552648    .1605492
              3  |   .0457687   .0008784    52.11   0.000     .0440471    .0474903
    ------------------------------------------------------------------------------
    
    
    
     qui ologit exp_bankloan real_bankloan_recode i.size i.sector, vce(cluster permid)
     margins , at(real_bankloan_recode=(1)) predict(outcome(-1))  atmeans post  
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   .4230018   .0037166   113.81   0.000     .4157173    .4302862
    ------------------------------------------------------------------------------
    But here, I get totally different values. Given all these different specifications, I am a bit confused, which is the most appropriate one. Also, for the last specification in 3), what does the constant tell me?

    Is anybody able to give a quick overview of the differenced between these three specifications?


    Thanks for taking the time to read my post, I really appreciate your help!

    Carsten


  • #2
    This is not a direct answer to your question, but perhaps will lead to one.

    Richard Williams, a member here, has made the course notes for his course in Categorical Data Analysis available online at

    https://www3.nd.edu/~rwilliam/xsoc73994/

    If you have not already done so, you could follow that link and find and download the PDF for "Adjusted predictions and marginal effects for multiple outcome commands and models". Perhaps it will help.

    The reason I didn't give a direct link to the PDF is because those course notes are often cited on Statalist and readers new to his work may well find other lectures that will be of interest and assistance. Really, all five lectures on Adjusted Predictions and Marginal Effects taken together constitute the best possible introduction to marginal effects and using margins.

    Comment


    • #3
      Thanks William! Indeed, I have followed the lectures of Richard. They read very well and are written in an understandable way!

      However, my question is rather concerning the specification of the independent variable. As it is neither continuous (such as age) nor binary (such as black and white), I am not sure which approach of marginal effects to follow.

      1) Define the independent variable as a factor variable (i.var)? This will give me two marginal effects which are referring to the base value.
      2) Treat it as a normal variable (ie. leave it as it is (-1 0 1))? This will produce only one marginal effect and I am unsure about how to interpret this.

      3) If I just run margins on just one outcome of the indep var, what does the constant tell me?

      Again, thanks a lot for your help!

      Comment


      • #4
        I have not tried it out with ologit, but, considering the "issue" on the variable specification, you could take it as a factor (from 0 onwards), then you could use contrasts (such as -1, 0, 1, etc.) in the postestimation. As I said, I didn't try it out, but hopefully it will work well. Good luck!
        Best regards,

        Marcos

        Comment


        • #5
          In #2, since you did not define real_bankloan_recode as a factor variable, it was treated as a continuous variable by ologit, as we would have seen if you had not suppressed the ologit output. You then issued the command
          Code:
          margins real_bankloan_recode, ...
          but the output of help margins tells us

          Syntax

          margins [marginlist] [if] [in] [weight] [, response_options options]

          where marginlist is a list of factor variables or interactions that appear in the current
          estimation results.
          Thus margins reports to you that it cannot find a factor variable based on real_bankloan_recode.

          In any event, the approach you take in #2 is inappropriate. By treating real_bankloan_recode as a continuous variable, you expect, down in the depths of the ologit formulation, a linear effect for real_bankloan_recode, so that the difference (at that point) between -1 and 0 would be identical to that between 0 and +1. That seems difficult to justify.

          Stick with the approach in #1, it is correct.

          Comment


          • #6
            Thanks Marcos. This I did in 1) of my original post: I recoded. However, I was asking myself which of the specifications used above is the most appropriate/correct one.

            Also, thanks again William. That's what I was suspecting. Getting margins for a continuous variables is not possible, however the average marginal effect (dxdy()) is, somehow. But I suppose this number is wrong then. But on that issue, thanks for the clarification. I would have counted on 1) as well in the first place.

            It remains the question of 3), asking what the constant is telling me if I chose to specify marginal effects at real_bankloan_recode=1.
            Does anybody know this?

            Thanks heaps!

            Last edited by Carsten Preuss; 19 Apr 2018, 10:19.

            Comment


            • #7
              A kind reminder to my last question. Maybe someone can help:

              In point 3) of my original post I was wondering what the constant is telling me if I calculate marginal effects at real_bankloan_recode=1 and at atmeans for the rest of the variables.

              Thanks a lot!

              Comment


              • #8
                Originally posted by Carsten Preuss View Post
                A kind reminder to my last question. Maybe someone can help:

                In point 3) of my original post I was wondering what the constant is telling me if I calculate marginal effects at real_bankloan_recode=1 and at atmeans for the rest of the variables.

                Thanks a lot!
                The constant is the grand mean of the outcome you asked the model to predict. In that output, you asked the model to predict the outcome -1 (which I think is that people expect the bank loan availability to deteriorate). 42.3% of respondents expect this outcome - but you asked Stata to assume that all respondents had real bank loan = 1 and that all other variables were held to their sample mean values. In another specification of the margins command, you asked Stata to imagine that all respondents had real bank loan = 1, then = 2, etc. You can compare the _cons value with the margin for real bank loan = 1. They're identical.

                In doing all this, you're engaging in counterfactual thinking. People do this all the time. I can't really say if it's appropriate or not in this case. That has to depend on your theoretical knowledge. I know this is cryptic, but doing good research is hard, blah blah blah. I will say, if you had used this command (Note that I dropped -atmeans!-):

                Code:
                margins , over(real_bankloan_recode) predict(outcome(-1)) post
                You'll instead have Stata give you the average predicted probability of outcome = -1 over the people who had each value of real_bankloan_recode. You're not resetting any of their covariates to the mean (although you can do so).
                Last edited by Weiwen Ng; 23 Apr 2018, 09:13.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Thanks a lot you all! Your replies helped me a lot more to understand everything!

                  Comment


                  • #10
                    I probably wouldn't do it in this case, but you can sometimes justify treating an ordinal variable as continuous. See

                    https://www3.nd.edu/~rwilliam/xsoc73...ndependent.pdf

                    I am also a big fan of the spost13 commands in general and the mtable command in particular. I think mtable is a good way to approach what you are trying to do in #3. For one thing, mtable has nicer formatting than margins. Since you are using ologit, consider

                    https://www3.nd.edu/~rwilliam/xsoc73994/Margins05.pdf
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      Thanks very much for your help, Richard!

                      Comment


                      • #12
                        Hello,

                        thank you very much for your comments which also helped me in my issue. However, I need some additional assistance. Any help is greatly, greatly appreciated!
                        I have a logit model with many categorical Independent variables, and also included an interaction between those categorical variables. I am not sure how to achieve the most suitable marginal effects.

                        I agree that it does not make sense to use the atmeans post command as there is no natural Interpretation for the mean of my categorical variables. However, I was not able to receive an output that displays the margins for all my variables - under the condition that I define certain categories, e.g. margins, at(cat_var1=0 cat_var2=(0(1)7) )

                        My question:
                        How do I receive a neat overview of all marginal effects of my independent variables like margins, dydx(_all) does, but define certain categories of factor variables?

                        My case:
                        I want to predict the decision that a member of an organization makes use of a teleworking policy (userafter=1).

                        My Code:
                        logit userafter userbefore i.tarifgruppe i.bu akademiker tz potenzial sex i.alter i.entfernung junge_eltern i.junge_eltern##i.tz##i.sex share_user_oe_gen share_user_oe1_gen oesize oe1size, vce(cluster oe1)
                        margins, at(userbefore=0 tarifgruppe=0 bu=(0(1)7) akademiker=1 potenzial=0 alter=2 entfernung=0) atmeans post

                        // this just gave me the individual marginal effects for variable bu. Also, I was not able to specify the tz, sex, or junge_eltern variable, and I guess it's because they are included in the interaction.


                        . margins, dydx(_all)

                        // this gave me an overview of the marginal effects of all variables, but as I said, the interpretation does not make sense.


                        Thank you very much for any help!
                        Hanna

                        Comment


                        • #13
                          Hanna "user": Please register with a real family name, as requested in the FAQ Advice.

                          Comment


                          • #14
                            Also while I might link to an older related thread I don't recommend adding on to it. I don't know about others, but when I see that a thread has several responses I tend to assume the problem is already being taken care of or is so difficult that I don't want to bother.

                            Also seeing the code and output could help. I am not sure why you think the results don't make sense. Use code tags. See pt 12 of the FAQ.
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            StataNow Version: 19.5 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://www3.nd.edu/~rwilliam

                            Comment


                            • #15
                              Also, I often find the mtable command (part of spost13_ado) more useful than margins when I want to plug in various values. See

                              https://www3.nd.edu/~rwilliam/stats3/Margins04.pdf
                              -------------------------------------------
                              Richard Williams, Notre Dame Dept of Sociology
                              StataNow Version: 19.5 MP (2 processor)

                              EMAIL: [email protected]
                              WWW: https://www3.nd.edu/~rwilliam

                              Comment

                              Working...
                              X