Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count the number of missing values?

    How to count the number of missing values?

    I have a simple question I would like to know asap. How I can count the number of missing (or NULL) values in distinct from 0 values?

    Note Added

    I have tried to use some functions listed below, but I don't know if these functions could distinguish the missing values from 0 values.
    • mdesc
    • missings
    Please kindly give me any answers or comments to me!

  • #2
    To give a quick reply: Stata has an internal coding scheme for missing values that are ., .a, .b, .c, ... I assume from the post that you want to count all kinds of missingness. Then its simply:

    Code:
    count if missing(VAR)
    Zero is then not a missing, of course. However, I assume you want to have a deeper look into the issue and you need to prepare your data. For example, some institutes code missing values with negatives, like -999. Clearly, you need to recode them beforehand, for example:

    Code:
    replace VAR = .a if VAR == -999 //Not at home
    replace VAR = .b if VAR == -9999 //Refusal
    So Stata knows what defines a missing.

    For more information see

    https://stats.idre.ucla.edu/stata/mo...issing-values/

    https://www.stata.com/manuals/u12.pdf#u12.2.1
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      The commands (not functions) mdesc and missings don't include zeros because to Stata zero is not considered missing, as Felix Bittmann explains clearly.

      mdesc is from SSC. missings is from the Stata Journal. The name of the latter is so close to the keyword missing that a more focused search is recommended through


      Code:
      . search dm0085, entry
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/20   SJ 20(4):1028--1030
              sorting has been extended for missings report
      
      SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q3/17   SJ 17(3):779
              identify() and sort options have been added
      
      SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/15   SJ 15(4):1174--1185
              provides command, missings, as a replacement for, and extension
              of, previous commands nmissing and dropmiss
      
      (end of search)
      where dm0085 is naturally not predictable without your being told about it. From the output just given, it is evident that missings has just been updated, and it may be downloaded regardless of whether you or your workplace subscribes to the Stata Journal (unless you can't install anything at all because of a local firewall or other security restrictions).

      Some example output:
      .
      Code:
       
      .. webuse nlswork
      (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
      
      . missings report
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -------------------
                |      #
      ----------+--------
            age |     24
            msp |     16
        nev_mar |     16
          grade |      2
       not_smsa |      8
         c_city |      8
          south |      8
       ind_code |    341
       occ_code |    121
          union |   9296
         wks_ue |   5704
         tenure |    433
          hours |     67
       wks_work |    703
      -------------------
      
      . missings report, sort(missings desc)
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -------------------
                |      #
      ----------+--------
          union |   9296
         wks_ue |   5704
       wks_work |    703
         tenure |    433
       ind_code |    341
       occ_code |    121
          hours |     67
            age |     24
            msp |     16
        nev_mar |     16
       not_smsa |      8
         c_city |      8
          south |      8
          grade |      2
      -------------------
      I've toyed with the idea of letting a user specify extra values they regard as missing and always retreated from it.

      Comment


      • #4
        Counting zeros (or equality with any other interesting constant) across a range of variables is a problem that might often arise. Here is some technique.


        Code:
        . webuse nlswork
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . ds, has(type numeric)
        idcode    age       nev_mar   not_smsa  ind_code  wks_ue    hours
        year      race      grade     c_city    occ_code  ttl_exp   wks_work
        birth_yr  msp       collgrad  south     union     tenure    ln_wage
        
        
        foreach v of var `r(varlist)' {
            quietly count if `v' == 0
            if r(N) > 0 di "`v'{col 16}" %6.0f r(N)
        }
        
        msp             11324
        nev_mar         21968
        grade              21
        collgrad        23739
        not_smsa        20469
        c_city          18336
        south           16843
        union           14728
        wks_ue          17394
        ttl_exp            21
        tenure           1248
        wks_work          478
        ln_wage             2

        Comment


        • #5
          Thank you so much for your fruitful comments.

          In fact, sometimes the original raw data itself, which is obtained from our clients, have some problem. For some parts of the raw data, the NULL's are fulfilled or replaced with zeros. In that case, it is difficult to count the number of NULL's in the data. Then, I cannot provide a reliable information to tell which data tables are practically useful or not.

          Comment


          • #6
            See https://www.statalist.org/forums/for...lable-from-ssc for another way to count zeros. Here is #4 revisited:


            Code:
            . webuse nlswork
            (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
            
            . countvalues, values(0)
            
              +------------------+
              |     name       0 |
              |------------------|
              |   idcode       0 |
              |     year       0 |
              | birth_yr       0 |
              |      age       0 |
              |     race       0 |
              |      msp   11324 |
              |  nev_mar   21968 |
              |    grade      21 |
              | collgrad   23739 |
              | not_smsa   20469 |
              |   c_city   18336 |
              |    south   16843 |
              | ind_code       0 |
              | occ_code       0 |
              |    union   14728 |
              |   wks_ue   17394 |
              |  ttl_exp      21 |
              |   tenure    1248 |
              |    hours       0 |
              | wks_work     478 |
              |  ln_wage       2 |
              +------------------+
            
            . countvalues, values(0) rowspositive
            
              +------------------+
              |     name       0 |
              |------------------|
              |      msp   11324 |
              |  nev_mar   21968 |
              |    grade      21 |
              | collgrad   23739 |
              | not_smsa   20469 |
              |   c_city   18336 |
              |    south   16843 |
              |    union   14728 |
              |   wks_ue   17394 |
              |  ttl_exp      21 |
              |   tenure    1248 |
              | wks_work     478 |
              |  ln_wage       2 |
              +------------------+
            
            . countvalues, values(0) rowspositive sort(desc)
            
              +------------------+
              |     name       0 |
              |------------------|
              | collgrad   23739 |
              |  nev_mar   21968 |
              | not_smsa   20469 |
              |   c_city   18336 |
              |   wks_ue   17394 |
              |    south   16843 |
              |    union   14728 |
              |      msp   11324 |
              |   tenure    1248 |
              | wks_work     478 |
              |  ttl_exp      21 |
              |    grade      21 |
              |  ln_wage       2 |
              +------------------+
            . Meanwhile, what do you mean in Stata terms by NULL? Is that a string value?

            Comment


            • #7
              In terms of Stata language, what I meant by NULL is ".", that is, missing value.

              Comment


              • #8
                Thanks, so #3 shows some ways to count missings, and countvalues would work too. You could go

                Code:
                countvalues, values(0 .)

                Comment

                Working...
                X