Short version
I want one information and one information only: in each and every variable in the dataset, is there only one non-missing observation, yes or no? Is there a neat way to achieve that?
Long version
I have a mapping table which connects IDs (JANIS_ID and BORID) from two datasets which I want to merge and between those IDs there can be m:n relations. So I have to identify the networks of IDs that belong together. The code below does that by:
What I am ultimately aiming for is a solution in which the number of iterations depends on whether the condition on the missings is fulfilled. I.e. "run the two mergers and duplicates drop commands until there are no variables with more than one non-missing".
I want one information and one information only: in each and every variable in the dataset, is there only one non-missing observation, yes or no? Is there a neat way to achieve that?
Long version
I have a mapping table which connects IDs (JANIS_ID and BORID) from two datasets which I want to merge and between those IDs there can be m:n relations. So I have to identify the networks of IDs that belong together. The code below does that by:
- creating two datasets which show each BORID per JANIS_ID and each JANIS_ID per BORID.
- iteratively merging the datasets from step 1 until all IDs that belong to each other are in one row, so it holds that: row = network
What I am ultimately aiming for is a solution in which the number of iterations depends on whether the condition on the missings is fulfilled. I.e. "run the two mergers and duplicates drop commands until there are no variables with more than one non-missing".
Code:
cd "" use mapping, clear gen x = _n // all JANIS_ID per BORID preserve reshape wide JANIS_ID, i(BORID) j(x) save mapping_1, replace describe global n_janisid = r(k) - 1 restore // all BORID per JANIS_ID preserve reshape wide BORID, i(JANIS_ID) j(x) save mapping_2, replace describe global n_borid = r(k) - 1 restore use mapping_1, clear drop BORID local n_janisid = ${n_janisid} local n_borid = ${n_borid} // ITERATION 1 forvalues n = 1(1)`n_janisid' { rename JANIS_ID`n' JANIS_ID display `n' qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen rename JANIS_ID JANIS_ID`n' } duplicates drop JANIS_ID* BORID*, force forvalues n = 1(1)`n_borid' { rename BORID`n' BORID display `n' qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen rename BORID BORID`n' } duplicates drop JANIS_ID* BORID*, force // ITERATION 2 forvalues n = 1(1)`n_janisid' { rename JANIS_ID`n' JANIS_ID display `n' qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen rename JANIS_ID JANIS_ID`n' } duplicates drop JANIS_ID* BORID*, force forvalues n = 1(1)`n_borid' { rename BORID`n' BORID display `n' qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen rename BORID BORID`n' } duplicates drop JANIS_ID* BORID*, force // ITERATION 3 forvalues n = 1(1)`n_janisid' { rename JANIS_ID`n' JANIS_ID display `n' qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen rename JANIS_ID JANIS_ID`n' } duplicates drop JANIS_ID* BORID*, force forvalues n = 1(1)`n_borid' { rename BORID`n' BORID display `n' qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen rename BORID BORID`n' } duplicates drop JANIS_ID* BORID*, force // check for missings missings report // Reshape to long format gen NETWID = _n reshape long JANIS_ID BORID, i(NETWID) j(n) drop if JANIS_ID==. & BORID==. drop n
Comment