Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hope to get help for dealing with outliers

    Dear all, I’m facing an issue and would like your advice. In my regression analysis, county-level population density is the key explanatory variable, and county area is a control variable. I log-transformed both variables, and the results show they approximately follow a normal distribution.

    The extreme values for county population, land area and population density in my data are not errors but reflect actual characteristics of certain counties. Since I focus on population density, I think the log transformation can reduce the influence of extreme values. I also find no significant difference between running the regression directly on the log-transformed population density or first removing extreme values of this variable before the log transformation.

    My co-author think that it is necessary to first remove extreme values of total population and area, then remove extreme values of population density, before regression. I think that even though population density is derived from population and area, this variable is meaningful as an independent variable. This is similar to GDP per capita: you might exclude extreme GDP per capita values, but it seems excessive to first remove extreme GDP and population values before analyzing GDP per capita. I even think that if I use log-tranformed population density in the regression, it is not even necessary to remove extreme value of population density before taking log transformation for this variable.

    I’d appreciate your thoughts on this approach. Thank you very much for your time and help!

  • #2
    I agree with your reasoning, both in terms of not dropping observations and log-transforming your variables. In fact, in macroeconomic studies, variables such as population, land area, and GDP are often log-transformed. The way your colleague defines outliers is also inconsistent with how I would define them, i.e., as impossible values that are obvious data errors. For your main analysis, dropping extreme values would likely bias your sample. However, for a couple of robustness tests, if your colleague wanted to check whether the observed results were driven by large countries, for example, it might be reasonable to run a sub-sample regression that excludes such defined large countries. Still, that cannot be your main analysis with the claim that the results are representative of all countries.

    Comment


    • #3
      Thank you so much for your suggestions, Dr. Musau. Regarding the robustness tests, I agree that it could be useful to check whether the results are driven by outliers. However, I believe it would be more appropriate to directly address extreme values in population density, rather than in population or area. Extreme cases in population or area do not necessarily correspond to extreme cases in population density, as these two factors interact to determine the latter. By focusing on population density directly, we can better align the robustness checks with the variable of interest while avoiding unnecessary exclusions. Hope to get more suggestions from you and from other experts here.

      Comment


      • #4
        I note that log (population density) = log (population) - log(area)

        so watch out. Using log population and log area as predictors may get most of what you want but you may need an interaction term too.

        I would not advocate omitting outliers at all.

        Comment


        • #5
          Yi:
          The extreme values for county population, land area and population density in my data are not errors but reflect actual characteristics of certain counties.
          This is a sound reason to avoid deleting outliers.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            To omit the Vatican would be a Cardinal error.

            Comment


            • #7
              Dear Dr. Cox and Dr. Lazzaro, thank you so much for your insightful and constructive suggestions! I also enjoyed the humor in your response....Thank you again for taking the time to share your expertise and for doing so in such a delightful way!

              Comment


              • #8
                Yi:
                please call me Carlo, as all on (and many more off) this list do. Thank you.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  To omit the Vatican would be a Cardinal error.

                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment


                  • #10
                    Yes, Carlo. Thank you

                    Originally posted by Carlo Lazzaro View Post
                    Yi:
                    please call me Carlo, as all on (and many more off) this list do. Thank you.

                    Comment

                    Working...
                    X