Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missings with mean

    Hello,
    I am working with the cses dataset that is all about election and voting behavior.

    My goal is to generate a variable that entails different means, depending on specific countries and party belonging.
    I know this sounds a bit chaotic, so let me show you an example of the code that comes closes to what I want to do:

    Code:
    egen myvar =mean(B3033_A) if B1006_NAM == "Germany"
    The problem with that code is that it creates missings for all the observations that are not Germany and B3033_A. I have tried to replace the missings using code like
    Code:
    replace myvar = mean(B3033_A) if B1006 == "France"
    but the syntax is incorrect and I get an error message.

    Is there any way I can replace the missings with different 'means'? Or am I copletely on the wrong track?

    I am just starting out with Stata, so I am sorry if there is a very obvious answer.

    Kind regards

    Paula

  • #2
    It's not clear to me what you want, but

    Code:
    egen myvar = mean(B3033_A), by(B1006_NAM)
    will put country means in every observation, including those with missing values on B3033_A, so long as there is at least one non-missing value to play with.

    Your replace statement was illegal as mean() can only be used with egen.
    Last edited by Nick Cox; 06 Apr 2022, 06:27.

    Comment


    • #3
      Dear Nick,

      thank you for your help. Let me try to be more clear in explaining what i want: In my dataset the relevant political parties from many countries are ranked with a score that ranges from 0-10 by those who answered the questionnaire. So the ranking of a given Party (B3033_A or B3033_B or B3033_C) are different real parties for the different countries that are included in the dataset.

      I now want to calculate the mean rating for every party from every country an ideally put that mean in one new variable.

      My workaround thus far has been calculating the mean for every party with the following code:

      Code:
      sum B3037_A if myparty == 1 & B1006_NAM == "Germany"
      (the if condition of myparty is included because I only want to calculate the mean for those who are partisans)

      Afterwards i have generated a new variable (called myvar) and then more or less manually replaced the results i got from my sum command:

      Code:
      replace myvar= 7.89 if B1006_NAM == "Germany" & myparty == 1
      By repeating this process for every country and every party I do get the result that I want (--> a variable with the mean rating for every party) but it takes a long time and I cannot help but wonder if there is a faster way.

      I hope this makes more sense now.

      Comment


      • #4
        It's the same answer from me, modified slightly.

        Code:
         
         by(B1006_NAM)
        could so far as Stata is concerned be
        Code:
        by(B1006_NAM party)  

        Comment


        • #5
          Thank you Nick. Unfortunately I am still not where I want to be. When I use the code you suggest:
          Code:
          egen myvar = mean(B3033_A), by(B1006_NAM)
          The variable myvar appears with a different mean of B3033_A for each country. The issue is that I do not want the mean of every observation of every country but of only those who are partisans, which is what i specified in the code i showed earlier using the myparty variable:
          Code:
          sum B3033_A if myparty == 1 & B1006_NAM == "Germany"
          Is there any way I can include the myparty variable in the code you suggested? Note that the myparty numbers and the partycodes of B3033_X are different from country to country.
          Last edited by Paula Landmesser; 06 Apr 2022, 08:11.

          Comment


          • #6
            Not what I said in #4. If you want to subdivide by party too, put that in the by() option. See #4.

            But if the parties you want are different numbers in different countries, there is no fix for that beyond (1) spelling out the different codes or (2) making a new variable with codes the same for what you consider the same.

            Comment


            • #7
              Thank you for your help - I guess I'll take the long route and spell out the different codes. At least I now know that I am not missing something.

              Comment


              • #8
                Maybe someone should say it: Replacing missing values with the mean is generally not a good idea (it will tend to bias analyses). Much better is multiple imputation.

                Comment

                Working...
                X