Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to interprete beta (b value) of an OLS regression where the dependent variable is square root transferred

    Hello Clyde Schechter, Bruce Weaver, George Ford, lorenabarberia, and Noor Sethi,

    Thank you so much for your continuous support to thrive in my educational endeavour. I am a bit confused about the correct way of interpreting the result of my regression models (OLS). In my OLS regression models I had to perform square root transformation to my dependent variable positive mental health (higher value is higher level of positive mental health). One of the results I have found with the three-category of immigrant variable (Canadian born, Recent immigrant, and Long residing immigrant) as follows:
    Immigration Status (Ref Cat: Canadian Born b
    Recent Immigrant 0.25**
    Long Residing immigrant -0.67***
    Constant 12.00***
    I am a bit confused how to interpret the result. Can you please help me out with your knowledge and understanding of this types of results?

    Thank you so so much again and again.

    Iqbal

  • #2
    Why are you using the square root? Does y sometimes take the value zero? What is it's range. My hunch is that the measure has only ordinal and not quantitative meaning. Is it roughly continuous or does it take on a handful of values?

    If y > 0 and it has quantitative meaning, you can use log(y) to obtain a percentage change interpretation -- and that's more straightforward than if you use sqrt(y).

    Comment


    • #3
      Originally posted by Jeff Wooldridge View Post
      Why are you using the square root? Does y sometimes take the value zero? What is it's range. My hunch is that the measure has only ordinal and not quantitative meaning. Is it roughly continuous or does it take on a handful of values?

      If y > 0 and it has quantitative meaning, you can use log(y) to obtain a percentage change interpretation -- and that's more straightforward than if you use sqrt(y).
      Thank you so much, Jeff Wooldridge

      I did sqrt transfer as the scale variable of Positive variable constructed from 14 items is moderately positively skewed as in the literature it says if the variable is moderately but positively skewed, the sqrt transfer is the ideal one. What do you think?

      Comment


      • #4
        Hello Jeff Wooldridge,
        the range is from 14 to 84. And Y is >0. Sorry for skepping this part in the last reply.

        Comment


        • #5
          Wages/income are skewed right, but never have I seen a sqrt transformation used. I'm with Jeff, use ln(y). The results will be nearly the same.

          But, I suspect you're following normal procedure in your field and will get harassed if you do something else.

          This may be of some help, and maybe Jeff will correct me if I'm wrong.

          Code:
          use https://www.stata-press.com/data/r18/wageed , clear
          
          hist wage
          g lwage = ln(wage)
          g swage = sqrt(wage)
          
          center lwage , g(lwagec)
          center swage , g(swagec)
          
          twoway kdensity lwagec || kdensity swagec
          
          g lage = ln(age)
          summ lage
          
          reg lwage lage
          margins, at(lage=(3.6 3.8))  expression(exp(predict(xb))*exp((`e(rmse)'^2)/2)) post
          di e(b)[1,2] / e(b)[1,1] - 1
          
          reg swage lage
          margins, at(lage=(3.6 3.8)) expression(predict(xb)^2) post
          di e(b)[1,2] / e(b)[1,1] - 1

          Comment


          • #6
            Re the -center- command George used in #5:
            Code:
            ssc describe center
            --
            Bruce Weaver
            Email: [email protected]
            Version: Stata/MP 18.5 (Windows)

            Comment


            • #7
              Hello Jeff Wooldridge, George Ford, and Bruce Weaver, thank you so much for your valuable direction. Would it be possible for you to share with me one or two references to that I can have support to argue with my committee? Take care.

              Comment


              • #8
                Originally posted by George Ford View Post
                Wages/income are skewed right, but never have I seen a sqrt transformation used. I'm with Jeff, use ln(y). The results will be nearly the same.

                But, I suspect you're following normal procedure in your field and will get harassed if you do something else.

                This may be of some help, and maybe Jeff will correct me if I'm wrong.

                Code:
                use https://www.stata-press.com/data/r18/wageed , clear
                
                hist wage
                g lwage = ln(wage)
                g swage = sqrt(wage)
                
                center lwage , g(lwagec)
                center swage , g(swagec)
                
                twoway kdensity lwagec || kdensity swagec
                
                g lage = ln(age)
                summ lage
                
                reg lwage lage
                margins, at(lage=(3.6 3.8)) expression(exp(predict(xb))*exp((`e(rmse)'^2)/2)) post
                di e(b)[1,2] / e(b)[1,1] - 1
                
                reg swage lage
                margins, at(lage=(3.6 3.8)) expression(predict(xb)^2) post
                di e(b)[1,2] / e(b)[1,1] - 1
                Hello @George Ford,
                Thank you. If I want to calculate margins of interaction term of two variables, say logGDP and Immigration status, what should be the stata code? Thank you once again.

                Comment


                • #9
                  I have used square root scales for visualization, usually for showing counted data where the detail that the root of zero is also zero is profoundly helpful.

                  For regression, I am less convinced that rooting the outcome variable is ever the best solution.

                  The marginal distribution of the outcome is itself not important and only pertinent if associated with massive outliers or nonlinearity that throw doubt on whether the particular y and X that appear in y = Xb imply a well-chosen functional form.

                  With a square root transformation, you can't rule out negative predictions somewhere -- and the fact that negative values can be squared is utterly unhelpful in this context.

                  As already pointed, a log scale for the outcome is consistent with thinking in terms of percentage change, and indeed with always predicting positive outcomes.

                  In practice, that means Poisson regression or equivalently a generalized linear model with log link, and robust errors. The glm framework for some reason appears unpopular or even unfamiliar to many economsts in particular.

                  A plot

                  Code:
                  twoway function log(x), ra(14 84)
                  shows that over the observed range, this is a mild transformation, so you can have it both ways, it is worth trying and it doesn't warp your data much.

                  Comment


                  • #10
                    I doubt there is a citation to always you log and not sqrt. But most textbooks will propose log with skewed data.

                    Comment


                    • #11
                      George Ford and Nick Cox thank you so much.

                      Comment

                      Working...
                      X