Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding .01 to all values of one variable

    Hi,

    I'd like to add .01 to all values of a variable because I need to log transform it to make a normal distribution and there are 0's in the raw variable. What is the code for that?

    Best,
    Tess

  • #2
    Would be best to not replace the original variables. Assuming the variable is called var1:

    Code:
    gen new_var1 = var1 + 0.01
    BUT, hear me out first. What you propose is generally not a good idea for a few reasons, check whatever applies to your case:
    1. A lot of people misunderstood that for regression to be valid, all continuous variables involved need to be normal. They transformed everything thinking that would be better, but eventually creating a model full of log-transformed variables then found themselves lacking the skills to interpret them.
    2. Log-transformation on a variable with many 0s seldom improves the distribution. If anything, it'd just shorten the right tail but the column of the original 0 will still be there.
    3. Better techniques are available (e.g. robust standard error, robust regression, etc.) that can retain the ease interpretation.

    Comment


    • #3
      I'm going to give you an answer to your question, but I expect it is not the right thing to do - see the questions below:
      Code:
      gen newvar=oldvar+.01
      replace "newvar" with the name you want for your new variable and replace "oldvar" with the name of the variable you want to add .01 to

      why do you think you need to have normally distributed variable? this sounds like you have the wrong idea about something but you don't really present anything about the actual goal of your research so I can't be completely sure - however, virtually no form of analysis needs a normally distributed variable - and, in the real world, virtually no variable is actually log-normally distributed - and yours clearly is not; further, where did you come up with ".01" - if you added a different amount, would your results change - quite possibly (which is why you need to defend your choice of .01); there has been lots of discussion on Statalist about these issues - you might want to do a search and check these out

      added in edit: crossed with #2 which makes many of the same points

      Comment

      Working...
      X