Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Show number of variables with specific number of (non-)missings

    Short version
    I want one information and one information only: in each and every variable in the dataset, is there only one non-missing observation, yes or no? Is there a neat way to achieve that?

    Long version
    I have a mapping table which connects IDs (JANIS_ID and BORID) from two datasets which I want to merge and between those IDs there can be m:n relations. So I have to identify the networks of IDs that belong together. The code below does that by:
    1. creating two datasets which show each BORID per JANIS_ID and each JANIS_ID per BORID.
    2. iteratively merging the datasets from step 1 until all IDs that belong to each other are in one row, so it holds that: row = network
    The job is done when each variable only contains one non-missing observation. In this simple example the missings report command is perfectly fine. In the actual dataset, however, I have over 8,000 variables which makes the output of missings report over 8,000 lines long. That is also why solutions that involve creating additional variables (e.g. through missings tab) are limited.

    What I am ultimately aiming for is a solution in which the number of iterations depends on whether the condition on the missings is fulfilled. I.e. "run the two mergers and duplicates drop commands until there are no variables with more than one non-missing".


    Code:
    cd ""
    
    use mapping, clear
    
    gen x = _n
    
    // all JANIS_ID per BORID
    
    preserve
    reshape wide JANIS_ID, i(BORID) j(x)
    save mapping_1, replace
    describe
    global n_janisid = r(k) - 1
    restore
    
    // all BORID per JANIS_ID
    
    preserve
    reshape wide BORID, i(JANIS_ID) j(x)
    save mapping_2, replace
    describe
    global n_borid = r(k) - 1
    restore
    
    use mapping_1, clear
    drop BORID
    
    local n_janisid = ${n_janisid}
    local n_borid = ${n_borid}
    
    
    // ITERATION 1
    
    forvalues n = 1(1)`n_janisid' {
        rename JANIS_ID`n' JANIS_ID
        display `n'
        qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
        rename JANIS_ID JANIS_ID`n'
    }
    
    duplicates drop JANIS_ID* BORID*, force
    
    forvalues n = 1(1)`n_borid' {
        rename BORID`n' BORID
        display `n'
        qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
        rename BORID BORID`n'
    }
    
    duplicates drop JANIS_ID* BORID*, force
    
    
    // ITERATION 2
    
    forvalues n = 1(1)`n_janisid' {
        rename JANIS_ID`n' JANIS_ID
        display `n'
        qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
        rename JANIS_ID JANIS_ID`n'
    }
    
    duplicates drop JANIS_ID* BORID*, force
    
    forvalues n = 1(1)`n_borid' {
        rename BORID`n' BORID
        display `n'
        qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
        rename BORID BORID`n'
    }
    
    duplicates drop JANIS_ID* BORID*, force
    
    
    // ITERATION 3
    
    forvalues n = 1(1)`n_janisid' {
        rename JANIS_ID`n' JANIS_ID
        display `n'
        qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
        rename JANIS_ID JANIS_ID`n'
    }
    
    duplicates drop JANIS_ID* BORID*, force
    
    forvalues n = 1(1)`n_borid' {
        rename BORID`n' BORID
        display `n'
        qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
        rename BORID BORID`n'
    }
    
    duplicates drop JANIS_ID* BORID*, force
    
    
    // check for missings
    
    missings report
    
    
    // Reshape to long format
    
    gen NETWID = _n
    
    reshape long JANIS_ID BORID, i(NETWID) j(n)
    drop if JANIS_ID==. & BORID==.
    drop n
    Attached Files

  • #2
    Code:
    quietly foreach v of var  * {
          count if !missing(`v')
          if r(N) == 1 local wanted `wanted' `v'
    }
    
    di "`wanted'"
    Incidentally, missings report from the Stata Journal does have a minimum() option. I guess you've just exemplified grounds for a maximum() option too.

    As that command has a name often used for its other more direct meaning, please note that


    Code:
    search dm0085, entry
    is a way to find out more.
    Last edited by Nick Cox; 06 Mar 2024, 09:47.

    Comment


    • #3
      Thanks for the quick reply. Yeah, I also thought that a maximum()option would really be helpful.

      I also kept playing around and actually managed to come up with a solution for counting the missings:
      Code:
        
      insobs 1
          foreach var of varlist JANIS_ID* BORID* {
              egen nonmis`var' = count(`var')
              replace `var' = nonmis`var' if _n == _N
              drop nonmis`var'
          }
          egen nonmissings = rowmax(_all) if _n == _
          local nonmissings = nonmissings[_N]
          dis "Maximum number of nonmissings: " `nonmissings'
          drop if _n == _N
          drop nonmissings

      Even solved the conditional loop thing:
      Code:
      cd ""
      
      use mapping, clear
      
      gen x = _n
      
      // all JANIS_ID per BORID
      
      preserve
      reshape wide JANIS_ID, i(BORID) j(x)
      save mapping_1, replace
      describe
      global n_janisid = r(k) - 1
      restore
      
      // all BORID per JANIS_ID
      
      preserve
      reshape wide BORID, i(JANIS_ID) j(x)
      save mapping_2, replace
      describe
      global n_borid = r(k) - 1
      restore
      
      use mapping_1, clear
      drop BORID
      
      local n_janisid = ${n_janisid}
      local n_borid = ${n_borid}
      
      // check for missings
      
      local nonmissings = 2
      dis `nonmissings'
      
      
      // Merging
      
      while `nonmissings' > 1 {
      
          qui forvalues n = 1(1)`n_janisid' {
              rename JANIS_ID`n' JANIS_ID
              display `n'
              qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
              rename JANIS_ID JANIS_ID`n'
          }
      
          duplicates drop JANIS_ID* BORID*, force
      
          qui forvalues n = 1(1)`n_borid' {
              rename BORID`n' BORID
              display `n'
              qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
              rename BORID BORID`n'
          }
      
          duplicates drop JANIS_ID* BORID*, force
          
          insobs 1
          foreach var of varlist JANIS_ID* BORID* {
              egen nonmis`var' = count(`var')
              replace `var' = nonmis`var' if _n == _N
              drop nonmis`var'
          }
          egen nonmissings = rowmax(_all) if _n == _
          local nonmissings = nonmissings[_N]
          dis "Maximum number of nonmissings: " `nonmissings'
          drop if _n == _N
          drop nonmissings
      
      }
      
      
      // Reshape to long format
      
      gen NETWID = _n
      
      reshape long JANIS_ID BORID, i(NETWID) j(n)
      drop if JANIS_ID==. & BORID==.
      drop n

      Comment

      Working...
      X