Show number of variables with specific number of (non-)missings

Sebastian Eiblmeier

Join Date: Mar 2017
Posts: 15

Show number of variables with specific number of (non-)missings

06 Mar 2024, 08:20

Short version
I want one information and one information only: in each and every variable in the dataset, is there only one non-missing observation, yes or no? Is there a neat way to achieve that?

Long version
I have a mapping table which connects IDs (JANIS_ID and BORID) from two datasets which I want to merge and between those IDs there can be m:n relations. So I have to identify the networks of IDs that belong together. The code below does that by:

creating two datasets which show each BORID per JANIS_ID and each JANIS_ID per BORID.
iteratively merging the datasets from step 1 until all IDs that belong to each other are in one row, so it holds that: row = network

The job is done when each variable only contains one non-missing observation. In this simple example the missings report command is perfectly fine. In the actual dataset, however, I have over 8,000 variables which makes the output of missings report over 8,000 lines long. That is also why solutions that involve creating additional variables (e.g. through missings tab) are limited.

What I am ultimately aiming for is a solution in which the number of iterations depends on whether the condition on the missings is fulfilled. I.e. "run the two mergers and duplicates drop commands until there are no variables with more than one non-missing".

Code:

cd ""

use mapping, clear

gen x = _n

// all JANIS_ID per BORID

preserve
reshape wide JANIS_ID, i(BORID) j(x)
save mapping_1, replace
describe
global n_janisid = r(k) - 1
restore

// all BORID per JANIS_ID

preserve
reshape wide BORID, i(JANIS_ID) j(x)
save mapping_2, replace
describe
global n_borid = r(k) - 1
restore

use mapping_1, clear
drop BORID

local n_janisid = ${n_janisid}
local n_borid = ${n_borid}


// ITERATION 1

forvalues n = 1(1)`n_janisid' {
    rename JANIS_ID`n' JANIS_ID
    display `n'
    qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
    rename JANIS_ID JANIS_ID`n'
}

duplicates drop JANIS_ID* BORID*, force

forvalues n = 1(1)`n_borid' {
    rename BORID`n' BORID
    display `n'
    qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
    rename BORID BORID`n'
}

duplicates drop JANIS_ID* BORID*, force


// ITERATION 2

forvalues n = 1(1)`n_janisid' {
    rename JANIS_ID`n' JANIS_ID
    display `n'
    qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
    rename JANIS_ID JANIS_ID`n'
}

duplicates drop JANIS_ID* BORID*, force

forvalues n = 1(1)`n_borid' {
    rename BORID`n' BORID
    display `n'
    qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
    rename BORID BORID`n'
}

duplicates drop JANIS_ID* BORID*, force


// ITERATION 3

forvalues n = 1(1)`n_janisid' {
    rename JANIS_ID`n' JANIS_ID
    display `n'
    qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
    rename JANIS_ID JANIS_ID`n'
}

duplicates drop JANIS_ID* BORID*, force

forvalues n = 1(1)`n_borid' {
    rename BORID`n' BORID
    display `n'
    qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
    rename BORID BORID`n'
}

duplicates drop JANIS_ID* BORID*, force


// check for missings

missings report


// Reshape to long format

gen NETWID = _n

reshape long JANIS_ID BORID, i(NETWID) j(n)
drop if JANIS_ID==. & BORID==.
drop n

Attached Files

mapping.dta (1.8 KB, 1 view)

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35696
#2

06 Mar 2024, 08:31

Code:

quietly foreach v of var * { count if !missing(`v') if r(N) == 1 local wanted `wanted' `v' } di "`wanted'"

Incidentally, missings report from the Stata Journal does have a minimum() option. I guess you've just exemplified grounds for a maximum() option too.

As that command has a name often used for its other more direct meaning, please note that

Code:

search dm0085, entry

is a way to find out more.

Last edited by Nick Cox; 06 Mar 2024, 08:47.
Comment

Sebastian Eiblmeier

Join Date: Mar 2017
Posts: 15

07 Mar 2024, 04:56

Thanks for the quick reply. Yeah, I also thought that a maximum()option would really be helpful.

I also kept playing around and actually managed to come up with a solution for counting the missings:

Code:

  
insobs 1
    foreach var of varlist JANIS_ID* BORID* {
        egen nonmis`var' = count(`var')
        replace `var' = nonmis`var' if _n == _N
        drop nonmis`var'
    }
    egen nonmissings = rowmax(_all) if _n == _
    local nonmissings = nonmissings[_N]
    dis "Maximum number of nonmissings: " `nonmissings'
    drop if _n == _N
    drop nonmissings

Even solved the conditional loop thing:

Code:

cd ""

use mapping, clear

gen x = _n

// all JANIS_ID per BORID

preserve
reshape wide JANIS_ID, i(BORID) j(x)
save mapping_1, replace
describe
global n_janisid = r(k) - 1
restore

// all BORID per JANIS_ID

preserve
reshape wide BORID, i(JANIS_ID) j(x)
save mapping_2, replace
describe
global n_borid = r(k) - 1
restore

use mapping_1, clear
drop BORID

local n_janisid = ${n_janisid}
local n_borid = ${n_borid}

// check for missings

local nonmissings = 2
dis `nonmissings'


// Merging

while `nonmissings' > 1 {

    qui forvalues n = 1(1)`n_janisid' {
        rename JANIS_ID`n' JANIS_ID
        display `n'
        qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
        rename JANIS_ID JANIS_ID`n'
    }

    duplicates drop JANIS_ID* BORID*, force

    qui forvalues n = 1(1)`n_borid' {
        rename BORID`n' BORID
        display `n'
        qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
        rename BORID BORID`n'
    }

    duplicates drop JANIS_ID* BORID*, force
    
    insobs 1
    foreach var of varlist JANIS_ID* BORID* {
        egen nonmis`var' = count(`var')
        replace `var' = nonmis`var' if _n == _N
        drop nonmis`var'
    }
    egen nonmissings = rowmax(_all) if _n == _
    local nonmissings = nonmissings[_N]
    dis "Maximum number of nonmissings: " `nonmissings'
    drop if _n == _N
    drop nonmissings

}


// Reshape to long format

gen NETWID = _n

reshape long JANIS_ID BORID, i(NETWID) j(n)
drop if JANIS_ID==. & BORID==.
drop n

Announcement

Show number of variables with specific number of (non-)missings

Comment

Comment