Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Delete records with missing valuables

    How can I delete all observations that have missing data in any of the variables that I have

    I have more than 61000 observations with 3809 variables and i need to keep only observation that are complete

  • #2
    Code:
    egen int mcount = rowmiss(_all)
    drop if mcount > 0
    Note that while this is very easy to do, it may not be a good idea. Have you considered other approaches to missing data that might introduce less bias?

    Comment


    • #3
      Hossam:
      I do share Clyde's wise advice: by eliminating all the observations with missing values, you're implicitly making-up your data.
      The resulting sample is, in all likelihood, miles far away from the original one and the bias induced in your results cannot be forecast as far as both direction and magnitude are concerned.
      When you deal with missing data (which is an ubiquitous nuisance) you should first diagnose the mechanism underlying that missingness (basically: is it ignorable or not?).
      The comprehensive -mi- section of the Stata .pdf manual sheds light on many missingness-related issues and provide very good references.
      One of my favourite textbooks on this topic is: https://www.crcpress.com/Flexible-Im.../9781138588318.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you so much for your reply @Clyde Schechter @Carlo Lazzaro i greatly appreciate it .
        the thing is i want to report the characteristic of data as count and percentage however i have missing values which is doesn't even exceed 1% the frequency are 1 to 45 counts and I'm strugling of whether i report the missing values in my paper since i want to report their count or should i only report percentage considering the small amount of missing values
        and thought if i deleted missing values would solve it but unfortunately it didnt

        Comment


        • #5
          A fair presentation would show counts and percents including missing as one of the categories for which the count and percent are given. Or you can give the counts and percents of the non-missing values along with a count of the number of missing observations. If it is true that you have less than 1% missing data, those two approaches won't even look very different.

          Comment


          • #6
            Thank you so much @Clyde Schechter you have been great help in everything . I will report both and attach in appendix a list of missing values according to variables . Thank you so much

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Code:
              egen int mcount = rowmiss(_all)
              drop if mcount > 0
              Note that while this is very easy to do, it may not be a good idea. Have you considered other approaches to missing data that might introduce less bias?
              Is it the same function as
              Code:
              dropmiss
              ?

              Comment


              • #8
                -dropmiss- is a command written by Nick Cox. The code shown in #2 and repeated in #7 uses only native Stata commands. (Although, historically egen started out as unofficial commands written by Nick Cox, but subsequently adopted into official Stata.) I don't use -dropmiss- myself, so I am not sure just what it does, but I believe it does, perhaps among other things, this.

                Comment


                • #9
                  Originally posted by Sonnen Blume View Post

                  Is it the same function as
                  Code:
                  dropmiss
                  ?
                  The code is unrecognized when i tried it

                  Comment


                  • #10
                    #8 is a nice post, but not quite right. egen existed as an official command before I wrote any egen functions. What is true is that I wrote several egen functions (also with @Rich Goldstein) around 1999/2000 that were folded into official Stata

                    Code:
                    STB-57  dm70.1  . . . . . . . .  Extensions to generate, extended: corrections
                            (help egenodd if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                            9/00    p.2; STB Reprints Vol 10, p.9
                            correction to the eqany, neqany, tag, and rmed egen functions;
                            many of the egen functions added to Stata 7
                    
                    STB-52  dm72.1  . . . . . . . . . . . . Alternative ranking procedures: update
                            (help altrank if installed) . . . . . . . . N. J. Cox and R. Goldstein
                            11/99   p.2; STB Reprints Vol 9, p.51
                            incorporated into Stata 7.0 egen rank() function
                    
                    STB-51  dm72  . . . . . . . . . . . . . . . . . Alternative ranking procedures
                            (help altrank, lbleqrnk if installed) . . . N. J. Cox and R. Goldstein
                            9/99    pp.5--7; STB Reprints Vol 9, pp.48--51
                            incorporated into Stata 7.0 egen rank() function
                    
                    STB-50  dm70  . . . . . . . . . . . . . . . . Extensions to generate, extended
                            (help egenodd if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                            7/99    pp.9--17; STB Reprints Vol 9, pp.34--45
                            24 additional egen functions presented; includes various string,
                            data management, and statistical functions;
                            many of the egen functions added to Stata 7

                    -- and several more that have not been folded into official Stata, most but not all of which are visible in egenmore (SSC). Other egen functions in official Stata were also originally community-contributed.

                    dropmiss is a command (not a function) that was always community-contributed. So, you would need to install it before you can use it.

                    Code:
                    SJ-15-4 dm89_2  . . . . . . . . . . . . . . . . . Software update for dropmiss
                            (help dropmiss if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                            Q4/15   SJ 15(4):1186--1187
                            dropmiss command has been superseded by a new command, missings,
                            which offers various utilities for managing variables that may
                            have missing values
                    
                    SJ-8-4  dm89_1  . . . . Dropping variables or observations with missing values
                            (help dropmiss if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                            Q4/08   SJ 8(4):594
                            update in style and content; added a new force option
                    
                    STB-60  dm89  . . . . . Dropping variables or observations with missing values
                            (help dropmiss if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                            3/01    pp.7--8; STB Reprints Vol 10, pp.44--46
                            drops variables or observations with all values (optionally
                            any values) missing
                    It's now deprecated by its author (c'est moi) as allowing too easily people to do what was asked in #1, which is (a) not necessarily a good idea (b) far from the only possibility, as already emphasised.

                    Having been personally involved is immaterial really, as the quoted results are all available by use of search within Stata.
                    Last edited by Nick Cox; 02 Jan 2020, 05:46.

                    Comment


                    • #11
                      I wasn't aware that -dropmiss- was a user written command, just got it installed somehow after a lot of research. I originally started using Stata just for data cleaning and -dropmiss-was my biggest attraction. Till today, its the first command I use after opening a dataset. Thanks Nick Cox for this fine contrubition!

                      Comment


                      • #12
                        Originally posted by Hossam Ali View Post

                        The code is unrecognized when i tried it
                        Type -findit dropmiss- it'll lead you there.

                        Comment

                        Working...
                        X