Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression with percentage changes

    Hello,

    I run a regression where the dependent variable is the fraction of individuals in a certain category that voted for a party (the fractions are between 0 and 1). My explanatory variable is the change in the gender wage ratio (women/men; the ratios are larger than 0 and most of them time smaller than 1) between the year in which the outcome variable is observed and some previous year. Thus, for a certain category, this could for example be 0.87-0.83=0.04. Since my variables are already in percent and percentage point changes, respectively, does it make sense to apply a log transformation to the outcome variable or both the outcome and the dependent variable for ease of interpretation? How would the coefficients be interpreted in those two cases? How do i interpret the coefficients without applying log transformations?
    Thank you.

  • #2
    It sounds like you would probably create confusion by applying log transformations here.

    If you compare two observation units whose values of the change in gender wage ratio is 1 percentage point (i.e. differ by 0.01), then if you do a straight linear-linear regression and you get a coefficient of b for that gender wage ratio change variable, the expected value of the dependent variable will go up by an amount = 0.01*b. This would correspond to a percentage point difference of b. So the interpretation is straightforward.

    That said, while it is certainly common for researchers to decide whether to log transform variables based on whether they want to be able to phrase their results in terms of relative or absolute differences in variables. But it is scientifically wrong to do that. The logarithm function is very non-linear when applied over wide ranges of numbers. Consequently if the relationship y = b0 + b1*x is reasonable, then the relationships between y and log x, or x and log y, or log x and log y must be non-linear, so that these alternative regressions are inherently mis-specified. And if one of those other relationships is a good specification of the relationship, then the linear relationship between y and x is mis-specified. At most one of the four possibilities can be a good specification. On the other hand, if the range of values of x and y are narrow, the degree of non-linearity of logarithm doesn't matter so much. But when the range of either variable is large enough, then the non-linearity of log will lead to the specification problem of the preceding paragraph.

    Really the decisions about log transformations should be made based on proper model specification. Exploring scatter plots of the data, with and without transformations, before modeling can be very helpful in this regard.

    Comment


    • #3
      Thank you a lot, Clyde, that's very helpful!

      Comment


      • #4
        Beta-regression followed by -margins- seems the way to go.
        Code:
        help betareg

        Comment


        • #5
          or fracreg.

          Common practice is not to log a ratio/fraction, though doing so does not affect the general interpretation (but might make it more confusing). Theoretically, the share might be 0, so the natural log is not a conceptually-sound transformation in general. The X variable could be zero or negative, even if it isn't.

          Comment


          • #6
            fracreg is in many cases more robust than betareg, so unless you need an estimate for the entire conditional distribution, I would prefer fracreg.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Hi, Maarten. What does it mean more robust?

              Comment

              Working...
              X