Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Listing for each ID, all the variables with missing data

    Hi statalist,

    I've got a very wide dataset, with quite a lot of missing values, each line is identified with a unique ID. My goal is to display, for each ID, all the variable with missing values.

    I usually use the code below when dealing with missing data. However it gives me the ID associated with a missing observation for a given variable instead of giving me all the variable with missing observations for a given ID.
    Code:
    mdesc
    list ID if missing(var)
    Here's an example of what i'd like obtain

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 make int(price mpg rep78)
    "AMC Concord"   . 22 3
    "AMC Pacer"     4749 17 3
    "AMC Spirit"    . 22 .
    "Buick Century" 4816 20 3
    "Buick Electra" . . .
    end
    Code:
    ID | Missing_variable
    "AMC Concord" price
    ------------------------------
    "AMC Spirit"  price
    "AMC Spirit" rep78
    ------------------------------
    "Buick Electra" price
    Any help would be greatly apperciated,

    Bilel

  • #2
    What you want is programmable but the code would be awkward and slow. It also raises the key practical question of what you intend to do with the results. For data as you describe you will get a long report and how are you going to process that?

    This sketches something that may be more manageable. For the auto dataset the result looks trivial but for a larger dataset with many missing values the result will not be.

    Code:
    sysuse auto, clear 
    
    unab all : * 
    gen hasmissing = "" 
    
    quietly foreach v of var `all' { 
        replace hasmissing = hasmissing + "`v' " if missing(`v')
    }
    
    list make hasmissing if !missing(hasmissing)
    
         +--------------------------+
         | make            hasmis~g |
         |--------------------------|
      3. | AMC Spirit        rep78  |
      7. | Buick Opel        rep78  |
     45. | Plym. Sapporo     rep78  |
     51. | Pont. Phoenix     rep78  |
     64. | Peugeot 604       rep78  |
         +--------------------------+

    Comment


    • #3
      ,

      Thank you very much for the quick answer Nick, your code works perfectly for my intended use.

      My goal was to export the result to excel. People would then filter the data by the ID they were in charge of and see if the missing data could be retrieved.
      Last edited by Bilel Well; 26 Apr 2022, 06:31.

      Comment


      • #4
        I'd use a reshape if you wish to export to Excel, that way you can directly export the data with File > Export.

        Code:
        foreach x of varlist price-rep78{
            rename `x' var`x'
        }
        
        gen id = _n
        reshape long var, i(id) j(varname, string)
        
        keep if var == .
        
        list make varname, sepby(id)
        This is the dataset:

        Code:
             +-------------------------+
             |          make   varname |
             |-------------------------|
          1. |   AMC Concord     price |
             |-------------------------|
          2. |    AMC Spirit     price |
          3. |    AMC Spirit     rep78 |
             |-------------------------|
          4. | Buick Electra       mpg |
          5. | Buick Electra     price |
          6. | Buick Electra     rep78 |
             +-------------------------+
        You can then export this directly, data to Excel.

        Comment


        • #5
          I think my result is entirely compatible with the goal revealed in #3. In practice I would split the new variable first so that the results were spread across several spreadsheet columns, but I don't use MS Excel unless I have to.

          Comment

          Working...
          X