"Compress" lines when missing values

Edgar Goldemer

Join Date: Oct 2019

Posts: 24
#1

"Compress" lines when missing values

12 Nov 2019, 14:55

Dear Statalists,

for every id in my dataset there are var1 to var7 that represent different values. Some of the values in var1 to var7 are missing.
Now I would like to "compress" each observation/each line, so that first the non-missing values occur and then the missing values.

For example, for id=1, 3 of the 7 values are missing: var1, var3 and var7. I would like to replace var1 by var 2, var2 by var4, var3 by var5 and var4 by var 6, so that the 3 missing values are "at the end"

Any ideas how to do this systematically for a large dataset?

Code:

clear input byte(id var1 var2 var3 var4 var5 var6 var7) 1 . 15 . 17 18 19 . 2 25 26 . 30 . . 12 end
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4408
#2

12 Nov 2019, 15:09

not completely clear, but I think that -rowsort- (from the Stata Journal) will do what you want; use -search- to find and download
Comment
Edgar Goldemer

Join Date: Oct 2019

Posts: 24
#3

12 Nov 2019, 15:27

Thanks, Rich!

rowsort would work, but I would like to keep the order of the numbers instead of putting them in ascending or descending order. I just want the missing values to be at the end,

Do you know a similar command to rowsort that keeps the order of the non-missing values?
Comment

Wouter Wakker

Join Date: Nov 2018
Posts: 621

12 Nov 2019, 16:31

I don't know about a command, but here is one technique.

Code:

clear
input byte(id var1 var2 var3 var4 var5 var6 var7)
1  . 15 . 17 18 19  .
2 25 26 . 30  .  . 12
end

reshape long var, i(id) j(j)
bys id: gen tosort = sum(!missing(var)) if !missing(var)
bys id (tosort): gen jj = _n
drop j tosort
reshape wide var, i(id) j(jj)
list


     +-----------------------------------------------------+
     | id   var1   var2   var3   var4   var5   var6   var7 |
     |-----------------------------------------------------|
  1. |  1     15     17     18     19      .      .      . |
  2. |  2     25     26     30     12      .      .      . |
     +-----------------------------------------------------+

Comment

Edgar Goldemer

Join Date: Oct 2019

Posts: 24
#5

12 Nov 2019, 17:34

I found another way, but there is still one problem:

Imagine I only care about var1 to var6. There are also var7 to var11, and if var1 to var6 have missing values, I would like to "refill" them by the values from var7-var10, while keeping the order of non-missing values. To do so, I have another variable called "start_missing". In my dataset, ID1 has missing values from var3 on, so that "start_missing" has a value of 3.

The problem with my code is, that within the forvalues loop, my data gets overwritten.

Can you help me out on this?

Code:

clear input byte(id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 start_missing) 1 13 15 . . . . 1 2 3 . . 3 2 25 26 30 54 . . 4 5 . . . 5 end forvalues i = 1/6 { replace var`i' = var7 if `i'== start_missing replace var`i+1' = var8 if `i'== start_missing replace var`i+2' = var8 if `i'== start_missing replace var`i+3' = var9 if `i'== start_missing replace var`i+4' = var10 if `i'== start_missing replace var`i+5' = var11 if `i'== start_missing }
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35209

12 Nov 2019, 18:32

Your data examples are all integers when they are not missing. So you can concatenate all the values into a string, remove the missings and then read them out again.

Code:

clear
input byte(id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 start_missing)
1 13 15  .  . . . 1 2 3 . . 3
2 25 26 30 54 . . 4 5 . . . 5
end

egen all = concat(var*), p(" ") 
replace all = trim(itrim(subinstr(all, ".", "", .))) 

quietly forval j = 1/11 { 
    replace var`j' = real(word(all, `j')) 
}

list all var* 

     +------------------------------------------------------------------------------------------------+
     |             all   var1   var2   var3   var4   var5   var6   var7   var8   var9   var10   var11 |
     |------------------------------------------------------------------------------------------------|
  1. |     13 15 1 2 3     13     15      1      2      3      .      .      .      .       .       . |
  2. | 25 26 30 54 4 5     25     26     30     54      4      5      .      .      .       .       . |
     +------------------------------------------------------------------------------------------------+

Comment

Edgar Goldemer

Join Date: Oct 2019

Posts: 24
#7

13 Nov 2019, 09:01

Nick, thank you so much! This is really what I was looking for.
Comment

Announcement

"Compress" lines when missing values

Comment

Comment

Comment

Comment

Comment

Comment