Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plug in a mean into the missing values

    Hi, I have a variable called father's education (see below) which includes some missing values. I want to replace the missing values with the average father's education calculated by using respondents who do not have missing values in father's education. My codes are as follows, but they do not work. Can anyone help me? Thanks!

    Code:
    egen mean_feduyear=mean(feduyear) if feduyear!=.
    tab mean_feduyear
    
    replace feduyear = mean_feduyear if missing(feduyear) 
    
    or
    replace feduyear = mean_feduyear if feduyear==.
    Click image for larger version

Name:	QQ20181017-203121@2x.png
Views:	1
Size:	54.0 KB
ID:	1466364

    Click image for larger version

Name:	QQ20181017-201824@2x.png
Views:	1
Size:	71.7 KB
ID:	1466363

  • #2
    Code:
    egen mean_feduyear=mean(feduyear) if feduyear!=.
    only places the value of mean_feduyear in those observations where feduyear is not missing. So when you then try to copy it into those values where feduyear is missing, there is nothing to copy. What you want is to just take of the -if feduyear != .- part of that code. That won't change the actual calculation of the mean, because missing values are never included anyway. But it will allow that value to be placed in every observation.

    Or, even simpler, just do:

    Code:
    summ feduyear, meanonly
    ​​​​​​replace feduyear = r(mean) if missing(feduyear)
    All of that said, be cautious about doing this at all. Using the mean for missing values has some real limitations associated with it.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Code:
      egen mean_feduyear=mean(feduyear) if feduyear!=.
      only places the value of mean_feduyear in those observations where feduyear is not missing. So when you then try to copy it into those values where feduyear is missing, there is nothing to copy. What you want is to just take of the -if feduyear != .- part of that code. That won't change the actual calculation of the mean, because missing values are never included anyway. But it will allow that value to be placed in every observation.

      Or, even simpler, just do:

      Code:
      summ feduyear, meanonly
      ​​​​​​replace feduyear = r(mean) if missing(feduyear)
      All of that said, be cautious about doing this at all. Using the mean for missing values has some real limitations associated with it.
      I see. Thank you very much!

      Comment


      • #4
        Yapeng:
        echoing Clyde's helpful advice, I would not sponsor mean substitution as a valid approach to deal with missing data (at best, the variance of your variable will be biased, probably downward).
        Why not considering a methodologically sounder strategy, such as -mi-?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Yapeng:
          echoing Clyde's helpful advice, I would not sponsor mean substitution as a valid approach to deal with missing data (at best, the variance of your variable will be biased, probably downward).
          Why not considering a methodologically sounder strategy, such as -mi-?
          Hi Carlo, can you explain more? I do not know that. Thanks! I am also trying to use multiple imputation, and am still learning how to do it in STATA.

          Comment


          • #6
            The first handout explains the problems with mean substitution and some other commonly used methods. The second handout discusses Multiple Imputation and FIML, which are generally superior.

            https://www3.nd.edu/~rwilliam/stats3/MD01.pdf

            https://www3.nd.edu/~rwilliam/stats3/MD02.pdf
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Originally posted by Richard Williams View Post
              The first handout explains the problems with mean substitution and some other commonly used methods. The second handout discusses Multiple Imputation and FIML, which are generally superior.

              https://www3.nd.edu/~rwilliam/stats3/MD01.pdf

              https://www3.nd.edu/~rwilliam/stats3/MD02.pdf
              Richard, thank you very much!

              Comment


              • #8
                Yapeng;
                in addition to Richard's excellen hand-outs, see also https://missingdata.lshtm.ac.uk/file...guidelines.pdf
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Yapeng;
                  in addition to Richard's excellen hand-outs, see also https://missingdata.lshtm.ac.uk/file...guidelines.pdf
                  Carlo, thank you very much!

                  Comment

                  Working...
                  X