Create new file based on data in another

L Rainen

Join Date: May 2014

Posts: 25
#1

Create new file based on data in another

31 Mar 2022, 05:15

I am stuck on how to create a new file based on a data in another.

The pseudo-algorithm for how I am trying to get this to work would be roughly:
- start a new file or frame
- for each row in the existing data/frame: look for a value 1 in variables pneumo - nausea
- for each value in that row which is 1 then copy the patid, when, arm, and variable name or label to the new dataset /frame (this could happen many times)
- when done, save file/frame

Given the sample data below... the final result should looke like:

patid when arm issue

2 scr 1 pneumo

2 scr 1 colitis

5 scr 1 pneumo

1 wk1 1 diarr

1 wk1 1 colitis

6 wk1 2 nausea

While this would be a fairly easy task in standard programming language, I am stumped how to do it in stata.

I looked at trying to use frames... or replacing the 1's with the string of value label and then collapsing... but not sure this is the right approach.

Any hints on how to do this would be appreciated.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long patid str7 when float arm byte(pneumo diarr colitis muco nausea) 1 "scr" 1 0 0 0 0 0 2 "scr" 1 1 0 1 0 0 3 "scr" 1 0 0 0 0 0 4 "scr" 1 0 0 0 0 0 5 "scr" 1 1 0 0 0 0 6 "scr" 1 0 0 0 0 0 1 "wk1" 1 0 1 1 0 0 2 "wk1" 1 0 0 0 0 0 3 "wk1" 1 0 0 0 0 0 4 "wk1" 2 0 0 0 0 0 5 "wk1" 2 0 0 0 0 0 6 "wk1" 2 0 0 0 0 1 end label values pneumo pneumo6_ label values diarr diarr6_ label values colitis colitis6_ label values muco muco6_ label values nausea nausea6_
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35698

31 Mar 2022, 05:28

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long patid str7 when float arm byte(pneumo diarr colitis muco nausea)
1 "scr" 1 0 0 0 0 0
2 "scr" 1 1 0 1 0 0
3 "scr" 1 0 0 0 0 0
4 "scr" 1 0 0 0 0 0
5 "scr" 1 1 0 0 0 0
6 "scr" 1 0 0 0 0 0
1 "wk1" 1 0 1 1 0 0
2 "wk1" 1 0 0 0 0 0
3 "wk1" 1 0 0 0 0 0
4 "wk1" 2 0 0 0 0 0
5 "wk1" 2 0 0 0 0 0
6 "wk1" 2 0 0 0 0 1
end
label values pneumo pneumo6_
label values diarr diarr6_
label values colitis colitis6_
label values muco muco6_
label values nausea nausea6_

egen wanted = anymatch(pneumo - nausea), values(1)
keep if wanted

rename (pneumo-nausea) whatever= 
reshape long whatever, i(patid when arm) j(issue) string 
keep if whatever 
drop whatever wanted 

list , sepby(patid)

     +------------------------------+
     | patid   when   arm     issue |
     |------------------------------|
  1. |     1    wk1     1   colitis |
  2. |     1    wk1     1     diarr |
     |------------------------------|
  3. |     2    scr     1   colitis |
  4. |     2    scr     1    pneumo |
     |------------------------------|
  5. |     5    scr     1    pneumo |
     |------------------------------|
  6. |     6    wk1     2    nausea |
     +------------------------------+

.

Comment

L Rainen

Join Date: May 2014

Posts: 25
#3

31 Mar 2022, 06:04

That works perfectly. Thanks loads. Looks easier to accomplish in STATA in the end! Amazing!
Comment
L Rainen

Join Date: May 2014

Posts: 25
#4

31 Mar 2022, 10:32

If you have time... could I add in one small question... how can the code be changed to save the matching data value.

For example, if the values in pneumo - nausea were values in the range 1 - 5 ... is there a way to edit the reshape to include the value as well? So the data would have the addition of the val column:

patid when arm issue val

2 scr 1 pneumo 1

2 scr 1 colitis 2

5 scr 1 pneumo 1

1 wk1 1 diarr 2

1 wk1 1 colitis 1

6 wk1 2 nausea 4

For example, instead of:
reshape long whatever, i(patid when arm) j(issue) string

Something like:
reshape long whatever, i(patid when arm `value') j(issue) string

? .. but the trick is how to get the 'value' in this 'i' list. Can it reference what is being reshaped... so something like `var'[i]?

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte patid str3 when byte(arm pneumo diarr colitis muco nausea) 1 "scr" 1 0 0 0 0 0 2 "scr" 1 1 0 2 0 0 3 "scr" 1 0 0 0 0 0 4 "scr" 1 0 0 0 0 0 5 "scr" 1 1 0 0 0 0 6 "scr" 1 0 0 0 0 0 1 "wk1" 1 0 2 1 0 0 2 "wk1" 1 0 0 0 0 0 3 "wk1" 1 0 0 0 0 0 4 "wk1" 2 0 0 0 0 0 5 "wk1" 2 0 0 0 0 0 6 "wk1" 2 0 0 0 0 4 end

Already done this to correct the values:
egen wanted = anymatch(pneumo - nausea), values(1,2,3,4,5)

Thanks for any suggestions or pointers!
Comment

Announcement

Create new file based on data in another

Comment

Comment

Comment