Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Academic acceptability of grouping large values in regression analysis

    Hello!

    I am new to Stata. I am using GDP per capita (GDPpc) and CO2 variables, which have large values, in my regression. Instead of transforming them into natural logarithms, I generate a unique number for each distinct GDPpc value within each year and country ID group.

    In Stata, I use these codes:
    Code:
    bys year(id): gen lGDPpc = group(GDPpc)
    bys year(id): gen lCO2 = group(CO2)
    Is this acceptable in academic work? Is this a common practice? I observe different results when I use natural logs compared to these variables.

  • #2
    No; this is in my view utterly wrong, if only for a different reason. There is an undocumented group() function in Stata which should not be used unless you know what it does and are confident that you want it. It is not a useful alternative to logarithms, or any other transformation, and only rarely will readers of your work -- whether very fluent in Stata or not -- be able to follow your work even if they have access to your code.

    See https://www.statalist.org/forums/for...ith-entropyetc for a recent example in which a user was bitten by this and especially the linked posts in #2 of that thread.

    Following that discussion I've suggested to StataCorp that group() should only be accessible under version control.

    https://www.stata.com/statalist/arch.../msg00406.html still seems to be the fullest story.

    It's hard to say whether you should be working with log of either variable, but I've often seen log of GDP oer head used for the usual reasons. What precisely are your carbon dioxide variables?
    Last edited by Nick Cox; 03 Jul 2024, 02:57.

    Comment


    • #3
      Prof. Nick,

      Thank you for providing insights into the issues with the group() function in Stata. I am currently working with CO2 emissions data (measured in million metric tonnes of CO2) from the US EIA.

      Comment


      • #4
        I was just unclear whether carbon dioxide meant e.g.. a concentration in ppm. As you are talking about absolute amounts, it seems quite likely that logarithms might help, noting that getting closer to linearity and additivity and then homoscedasticity are bigger deals than say marginal or even conditional normality. But sometimes subduing or taming outliers is also an effect of transformations and to that extent goals may be consistent.

        Comment

        Working...
        X