Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logging independent variables

    Hello!

    I am trying to run a panel data regression where 2 of my independent variables (in 2 different regressions of the same DV) are human impacs and economic impacts of certain climatic extremes. My problem is that I am not sure if I should log the variables or use them as they are. Not logging gives me more significant results, but there is higher variability. Also since for most economic variables such as gdp I usually use their logged versions I am not sure if I should do the same in this case? Here are my code lines for each of the regressions:
    xtreg env_conc_u econ_impact_floods econ_impact_storms econ_impact_droughts econ_impact_wildfires econ_impact_extreme_temps log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robust
    xtreg env_conc_u log_econ_impact_droughts log_econ_impact_extreme_temps log_econ_impact_floods log_econ_impact_storms log_econ_impact_wildfires log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robust
    xtreg env_conc_u human_impact_floods human_impact_storms human_impact_droughts human_impact_wildfires human_impact_extreme_temps log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robustxtreg env_conc_u log_human_impact_droughts log_human_impact_extreme_temps log_human_impact_floods log_human_impact_storms log_human_impact_wildfires log_active_pop log_gdp perc_sectA urban_pop secondary_educ avg_irrigated, fe robust

    Thank you!

  • #2
    As you can imagine the answer is: "it depends". How are the "impacts" measured? Is it one a scale like "severe" "mild" "none", or is it measured in dollars, or something else (if so, what)?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      the economic impact is measured in thousands dollars while the human impacts is the number of people affected (injured +homeless)

      Comment


      • #4
        Does the value 0 appear?
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Yes, quite often (depending on whether there was a climate extreme or not at the time)

          Comment


          • #6
            So, observations with value zero on those predictors will drop out of the regression if you take logarithms. That isn't usually a good idea.

            Comment


            • #7
              I tried transforming them using this: gen log_human_impact_wildfires = log(human_impact_wildfires + 1) so my 0 observations wouldnt drop out

              Comment


              • #8
                No, no, no! Do not do that. Now you have lost the value of logarithms.

                The bigger issue you need to consider is that there is a qualitative difference between 0 and any positive value (0=no disaster, positive value=disaster). So your variable should change gradually, but allow for a discrete change at 0. Look at this Stata Tip on how to implement that: https://journals.sagepub.com/doi/ful...urnalCode=stja

                If you need additional nonlinearity you can enter linear splines, see help mkspline
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Thank you

                  Comment


                  • #10
                    I'd agree with Maarten Buis that that fudge is far from unproblematic. There are recent papers in economics that are now very negative (pun intended) about it. Cue John Mullahy

                    First off for visualization log (y + 1) or more generally log (y + c) where c is big enough to ensure that all logarithms are positive is sometimes a way to ensure that zeros (or even small negative values) can be plotted -- but there you can see what the scale does.

                    Sometimes using log (y + c) creates outliers! log (y + smidgen) for smidgen very very small is close to log y for large y but not at all close for y near zero.

                    For modelling you lose the easy interpretation that goes with logarithms.

                    Why 1 any way? For counted zeros, that is a little arbitrary. For measured zeros, it is a lot arbitrary.

                    What to do instead? Sometimes square or cube roots help.

                    A two-step device:

                    * replace zeros with the smallest observed value.

                    * use logarithms of that variable together with an indicator for which observations were zero

                    Comment

                    Working...
                    X