Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use standard deviation to interpret results?

    Hello everyone,

    Recently I noticed that many papers they use standard deviation to interpret the results. For exmple, in one paper, the table uses firms' leverage as dependent variable, and in the main explanatory variable-state corruption, the coefficient is 0.172( significant at 10% level), standard error is 0.098, sample size is 110,094. Then the paper says that " a one standard deviation increase in state corruption implies an increase in leverage euqal to 12.29% of mean leverage.

    I really dont understand how to use the SD to interpret the results like that. I think this may be a silly question..but i would appricate if anyone can help me...

    Chen

  • #2
    This usually arises in a context where the explanatory variable is entered into a regression model after it is standardized to a mean of zero and a standard deviation of 1. In that case, a 1 standard deviation increase in the explanatory variable is the same thing as a unit increase in the standardized version used in regression, and the effect on the outcome variable being reported is just the marginal effect or elasticity of that standardized explanatory variable.

    When the explanatory variable has no natural metric or scale this may be an appropriate way to present results. Unfortunately, it is sometimes also seen in conjunction with variables which have obvious natural metrics such as age, or even with dichotomous variables. In that situation the effect, if not the intent, is merely obfuscatory. After all, who knows how much of an increase in age corresponds to 1 standard deviation in the study sample, or what the standard deviation of a dichotomous variable in some data sample is?

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      This usually arises in a context where the explanatory variable is entered into a regression model after it is standardized to a mean of zero and a standard deviation of 1. In that case, a 1 standard deviation increase in the explanatory variable is the same thing as a unit increase in the standardized version used in regression, and the effect on the outcome variable being reported is just the marginal effect or elasticity of that standardized explanatory variable.

      When the explanatory variable has no natural metric or scale this may be an appropriate way to present results. Unfortunately, it is sometimes also seen in conjunction with variables which have obvious natural metrics such as age, or even with dichotomous variables. In that situation the effect, if not the intent, is merely obfuscatory. After all, who knows how much of an increase in age corresponds to 1 standard deviation in the study sample, or what the standard deviation of a dichotomous variable in some data sample is?
      Thank you very much for your reply. But taking the example in my question, how did the author calculate the marginal effect or elasticity of that standardized explanatory variable? e.g. 12.29%?
      And how can we standardize a dependent variable and enter it into regression? Many thanks.

      Chen

      Comment


      • #4
        Maybe you wish to take a look at the ado file - listcoef - under the SSC SPost13:

        Below, an example on how to use it:

        Code:
        . use "C:\Program Files (x86)\Stata14\ado\base\a\auto.dta", clear
        (1978 Automobile Data)
        
        . tabstat mpg length foreign, statistics( sd )
        
           stats |       mpg    length   foreign
        ---------+------------------------------
              sd |  5.785503  22.26634  .4601885
        ----------------------------------------
        
        . regress price c.length c.mpg i.foreign
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(3, 70)        =     12.14
               Model |   217367689         3  72455896.3   Prob > F        =    0.0000
            Residual |   417697707        70   5967110.1   R-squared       =    0.3423
        -------------+----------------------------------   Adj R-squared   =    0.3141
               Total |   635065396        73  8699525.97   Root MSE        =    2442.8
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              length |   59.61193   23.90525     2.49   0.015     11.93442    107.2894
                 mpg |  -139.0814   82.20966    -1.69   0.095    -303.0434    24.88062
                     |
             foreign |
            Foreign  |   2644.771   761.8912     3.47   0.001     1125.227    4164.315
               _cons |  -2861.984     6026.6    -0.47   0.636    -14881.66     9157.69
        ------------------------------------------------------------------------------
        
        . listcoef, help
        
        regress (N=74): Unstandardized and standardized estimates 
        
          Observed SD:  2.9e+03
          SD of error:  2.4e+03
        
        -------------------------------------------------------------------------------
                     |         b        t    P>|t|    bStdX    bStdY   bStdXY     SDofX
        -------------+-----------------------------------------------------------------
              length |   59.6119    2.494    0.015  1327.340    0.020    0.450    22.266
                 mpg | -139.0814   -1.692    0.095  -804.656   -0.047   -0.273     5.786
                     |
             foreign |
            Foreign  | 2644.7712    3.471    0.001  1217.093    0.897    0.413     0.460
            constant | -2.86e+03   -0.475    0.636        .        .        .         .
        -------------------------------------------------------------------------------
               b = raw coefficient
               t = t-score for test of b=0
           P>|t| = p-value for t-test
           bStdX = x-standardized coefficient
           bStdY = y-standardized coefficient
          bStdXY = fully standardized coefficient
           SDofX = standard deviation of X
        Best,

        Marcos
        Best regards,

        Marcos

        Comment


        • #5
          Originally posted by Marcos Almeida View Post
          Maybe you wish to take a look at the ado file - listcoef - under the SSC SPost13:

          Below, an example on how to use it:

          Code:
          . use "C:\Program Files (x86)\Stata14\ado\base\a\auto.dta", clear
          (1978 Automobile Data)
          
          . tabstat mpg length foreign, statistics( sd )
          
          stats | mpg length foreign
          ---------+------------------------------
          sd | 5.785503 22.26634 .4601885
          ----------------------------------------
          
          . regress price c.length c.mpg i.foreign
          
          Source | SS df MS Number of obs = 74
          -------------+---------------------------------- F(3, 70) = 12.14
          Model | 217367689 3 72455896.3 Prob > F = 0.0000
          Residual | 417697707 70 5967110.1 R-squared = 0.3423
          -------------+---------------------------------- Adj R-squared = 0.3141
          Total | 635065396 73 8699525.97 Root MSE = 2442.8
          
          ------------------------------------------------------------------------------
          price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          length | 59.61193 23.90525 2.49 0.015 11.93442 107.2894
          mpg | -139.0814 82.20966 -1.69 0.095 -303.0434 24.88062
          |
          foreign |
          Foreign | 2644.771 761.8912 3.47 0.001 1125.227 4164.315
          _cons | -2861.984 6026.6 -0.47 0.636 -14881.66 9157.69
          ------------------------------------------------------------------------------
          
          . listcoef, help
          
          regress (N=74): Unstandardized and standardized estimates
          
          Observed SD: 2.9e+03
          SD of error: 2.4e+03
          
          -------------------------------------------------------------------------------
          | b t P>|t| bStdX bStdY bStdXY SDofX
          -------------+-----------------------------------------------------------------
          length | 59.6119 2.494 0.015 1327.340 0.020 0.450 22.266
          mpg | -139.0814 -1.692 0.095 -804.656 -0.047 -0.273 5.786
          |
          foreign |
          Foreign | 2644.7712 3.471 0.001 1217.093 0.897 0.413 0.460
          constant | -2.86e+03 -0.475 0.636 . . . .
          -------------------------------------------------------------------------------
          b = raw coefficient
          t = t-score for test of b=0
          P>|t| = p-value for t-test
          bStdX = x-standardized coefficient
          bStdY = y-standardized coefficient
          bStdXY = fully standardized coefficient
          SDofX = standard deviation of X
          Best,

          Marcos
          Thank you very much for this, actually I noticed that for finance papers regarding corruption, they all use standard interpretation to interpret the results.. This is interesting, i think i need to look into the issue carefully.

          Comment


          • #6
            A slightly more primitive way to do this is to think about a standard deviation change in x as simply a number. So you estimate the standard deviation of x in the estimation sample using the summary routine. Then you use margins to generate the predicted y for two values of x one standard deviation apart.

            So, is sd is 2, and everything is linear, you want margins to give you predicted y for x=0 and x=2. The difference is the change in y for a one sd change in x.

            Phil

            Comment


            • #7
              The "fully standardized coefficient" are also known by beta coefficients (in case you want to read more about this in an econometrics textbook).

              Comment


              • #8
                Hello all,

                I found this thread since I have the same problem as Chen Huang. I looked into the command listcoef and actually it perfectly suits my needs since I‘m interested in bStdX.

                My understanding of bStdX: These are the regression coefficients with the x-variables (the independent variables) in standard deviations and the y-variable (the dependent variable) in its original units.

                However I‘m using a user written regression command called xtfmb (Fama MacBeth two-step panel regression) and that doesn‘t work with listcoef.

                Do you have any idea how I still could get the results? Maybe you even have a code example.

                Many thanks already,
                Jan

                Comment


                • #9
                  Hello all,
                  I read in a paper that the coefficient estimate of Independent variable(IV) ( coefficient value −0.00821 t-statistic (9.04)***) is significant and negatively associated with the Dependent variable(DV)
                  in the regression at the 1% level. Specifically, a one-unit increase in IV reduces the DV by 0.00821, which represents 29% of the average DV.
                  In the above case the Standard deviation(SD) of IV is 0.325 and SD of DV is 0.033.

                  Q1: my question is I don't know how they calculated this 29% ? manually or through stata?

                  Normally for economic significance, we are using this formula( coefficient of Independent variable * Standard deviation of Independent variable)/Standard deviation of Dependent variable .
                  As in below example li et al 2017 Trust and Stock Price Crash Risk: Evidence from China

                  This negative relationship between crash risk and social trust is both statistically and economically significant. For example, the coefficient of TRUST1t (column 1) is -0.0193,
                  which means that a one-standard-deviation increase in the social trust of a firm location is associated with a decrease of 1.94% (=0.0193*0.6866/0.6843) of a standard deviation in
                  future crash risk as measured by NCSKEW, ceteris paribus.TRUST1t Standard deviation is 0.6866 and NCSKEW Standard deviation is 0.6843)
                  Q2: I want to calculate the economic significance or predictive margin for my study.
                  Many Thanks
                  Ayub

                  Comment


                  • #10
                    Hello stata list members,
                    i would like to ask my previous question #9 again, but this time i can add some more values, the mean value of Dependent variable is 0.028 and the coefficient of IV is -0.00821, and they get (29% of the average DV).so can i divide the mean value of DV by the coefficient value of IV (-0.00821)? to get 29%(-0.00821/0.028)= 29%. which represent 29% of the average DV.
                    Q4. from one another paper they calculated it in other way, so could you pleas suggest me some relevant links on this formula, (coefficient estimat on CEO power*one standard deviation change in CEO Power)/Average Board Diversity for the sample) =(-0/0436*0.586)/13.1=1.95% ( 1 standard deviation increase in CEO power (SD =0.586) is associated with a decrease in Board diversity of 1.95%. A decrease of 1.95% board hetrogenity is equalent to replaceing one domestic director with foreign nationality director. But actually when i am looking for.the value 0.586 , i could not find it in.disciptive statistic table. so may be.there are some other codes to calculat it.
                    looking for your kind suggestion
                    Last edited by Ayub UOM; 19 Jun 2020, 10:23.

                    Comment

                    Working...
                    X