Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accounting for missing variables due to taking logs

    The variable 'population growth (annual %)', is generating quite a few missing values when I take the log, because there are quite a few negative growth rates for my panel data analysis.

    I have been told to control for the missing values as there might be a mistake in the computation - how should I do this?

    Also if I didn't log this variable and all the other variables are logged, would this cause a significant difference to my estimations?

    Many thanks

  • #2
    Hi Avnish,
    My personal preference is that some type of variables, due to their nature, do not need or should not be "logged". If get a hold of Introductory Econometrics by Wooldridge, at the end of Chapter 6 there are some considerations and suggestions when to use and not to use a log transformation.
    My suggestion is that growth rates (of almost any kind) should not be logged.
    HTH
    Fernando

    Comment


    • #3
      Hi Fernando,

      Thanks so much for your speedy response. The problem is that when I don't take the log of this variable my t value loses significance (increases from 0.058 to 0.103) and the coefficient on my main explanatory variable falls (only slightly though)

      Do you still think it's worth leaving in this case?

      Comment


      • #4
        Originally posted by Avnish Sethi View Post
        The problem is that when I don't take the log of this variable my t value loses significance (increases from 0.058 to 0.103) and the coefficient on my main explanatory variable falls (only slightly though)
        In my opinion, your goal should be to find a model that reasonably approximates the data-generating process; you should definitely not hunt statistical significance. By the way, an increasing t value would be accompanied by a decreasing p-value, hence be more likely to be significant.

        Best
        Daniel

        Comment


        • #5
          Apologies Daniel, I meant to say the p value increases from 0.058 to 0.103, and thus significance is reduced.

          I see what you're saying in terms of not going after statistical significance and will definitely take this on board. However, I'm still unsure whether logging population would be more suitable in approximating my equation, would you be able to advise?

          Warm Regards
          Avnish

          Comment


          • #6
            Usually, you log variables in a linear model to get closer to a linear relationship. Have you looked at the relationship between your variables? Another reason for logging variables is that others in your field have done so and you want your results to be easier to compare to theirs; in this case, you would need to see how others have dealt with negative or zero values.

            I guess I am with Fernando here; if the log of some values is not defined (or infinity), then it probably does not make sense to take logs. You could go for a generalized linear model (glm, in Stata).

            Best
            Daniel

            Comment

            Working...
            X