Duplicate prescriptions

Katie Holzer

Join Date: Jan 2018

Posts: 65
#1

Duplicate prescriptions

21 Feb 2020, 11:32

I am using Stata 16 to analyze a dataset with prescription information. Each person has multiple prescriptions and their unique ID shows up for each prescription. I would like to remove duplicates, in that I would like to create columns for prescription 1, prescription 2, etc. instead of having a new sample ID for each prescription. I believe this may be changing from long form to short form, but that terminology may be wrong.
I appreciate any guidance and apologize if anything is unclear.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

21 Feb 2020, 11:34

I recommend to use - dataex - to provide a clear display of the data.

Best regards,

Marcos
Comment
Adrien Bouguen

Join Date: Jul 2014

Posts: 85
#3

21 Feb 2020, 11:41

You are correct Katie: you need to transform your dataset to a wide format. The command should be " reshape wide". Look at the help section for more guidance.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

21 Feb 2020, 11:48

Katie:
Others provided helpful insights.
That said, my take on that is a bit different: multiple observations for the same IDs are usually better analyzed in -long- format (hence, I would not -reshape-).
As medication possibly differ as they may target different diseases that affect the same patient, you can add a categorical variable (one level for therapeutic area) and -label- it as convenient.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Katie Holzer

Join Date: Jan 2018

Posts: 65
#5

21 Feb 2020, 12:36

Thank you all. When I attempted to reshape, I get the error: Data are already wide. Any advice for how to continue?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

21 Feb 2020, 13:49

The suggestion from post #2 is pertinent to solving the problem in post #5. If you cannot show us what your data are like, and cannot tell us the exact command you used, we can only guess what you did wrong.
Comment
Katie Holzer

Join Date: Jan 2018

Posts: 65
#7

21 Feb 2020, 14:01

Thank you, William. I apologize. I attempted to use dataex and got an error to specify fewer variables. I specified by ID variable and the result was "input long IDvar" followed by 100 ID variables. Unfortunately my sample ID # are identifiable and I cannot share them. (DOC #s).

The reshape command I used was: reshape long DrugName, i(DOCID) j(time)
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

21 Feb 2020, 15:42

You did not get the error "Data are already wide" as a result of issuing a command "reshape long ...".
Comment
Katie Holzer

Join Date: Jan 2018

Posts: 65
#9

21 Feb 2020, 16:01

I'm sorry, I appreciate your patience. I was experimenting and forgot I changed the code. I created a new unique ID so that I could share my dataex results. Here it is below. Basically each person has multiple prescriptions. I would like instead of duplicate IDs to have multiple drugname variables (drugname1, drugname2) for each person.

input long customerid str38 drugname
8769212 "DULOXETINE HCL DR 60MG CAP"
8769211 "HYDROXYZINE PAMOATE 25MG CAP"
8766843 "LEVOTHYROXINE 150MCG TAB"
8769211 "MIRTAZAPINE 45MG TAB"
8766843 "RANITIDINE 150MG TAB"
8769364 "ALLOPURINOL 100MG TAB"
8769360 "AMLODIPINE 10MG TAB"
8769348 "AMLODIPINE 10MG TAB"
8769344 "AMLODIPINE 10MG TAB"
8769316 "AMLODIPINE 5MG TAB"
8769348 "ATORVASTATIN CALCIUM 10MG TAB"
8767105 "ATORVASTATIN CALCIUM 20MG TAB"
8769324 "ATORVASTATIN CALCIUM 20MG TAB"
8769316 "ATORVASTATIN CALCIUM 20MG TAB"
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

#10

21 Feb 2020, 16:31

Here is sample code that does what you ask, I believe.

Code:

. sort customerid, stable

. by customerid: generate pnum = _n

. reshape wide drugname, i(customerid) j(pnum)
(note: j = 1 2)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       14   ->      10
Number of variables                   3   ->       3
j variable (2 values)              pnum   ->   (dropped)
xij variables:
                               drugname   ->   drugname1 drugname2
-----------------------------------------------------------------------------

. list, clean noobs

  customerid                       drugname1                       drugname2  
     8766843        LEVOTHYROXINE 150MCG TAB            RANITIDINE 150MG TAB  
     8767105   ATORVASTATIN CALCIUM 20MG TAB                                  
     8769211    HYDROXYZINE PAMOATE 25MG CAP            MIRTAZAPINE 45MG TAB  
     8769212      DULOXETINE HCL DR 60MG CAP                                  
     8769316              AMLODIPINE 5MG TAB   ATORVASTATIN CALCIUM 20MG TAB  
     8769324   ATORVASTATIN CALCIUM 20MG TAB                                  
     8769344             AMLODIPINE 10MG TAB                                  
     8769348             AMLODIPINE 10MG TAB   ATORVASTATIN CALCIUM 10MG TAB  
     8769360             AMLODIPINE 10MG TAB                                  
     8769364           ALLOPURINOL 100MG TAB

With that said, the experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. You should try to achieve what you need with the data organized as it currently is, and seek the help of Statalist in doing so. The sort of problems you will encounter trying to use your reshaped data will almost certainly be solved by reshaping the data. It is much easier, for example, to determine who received a particular drub by searching a a in single variable on multiple observations for each cutomerid, than it is to search in multiple variables on a single observation for each customerid.

Comment

Katie Holzer

Join Date: Jan 2018

Posts: 65
#11

21 Feb 2020, 16:35

I'm sure I don't completely understand the issues with reshaping in Stata and I appreciate your warning. I am running VERY simple descriptives. For example, if I want to know the number of females in the dataset and I tab the gender variable, the count will include the duplicate observations, correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

21 Feb 2020, 16:59

I am running VERY simple descriptives. For example, if I want to know the number of females in the dataset and I tab the gender variable, the count will include the duplicate observations, correct?

Correct, but there is a very simple workaround for that:

Code:

egen flag = tag(customerid) tab gender if flag
1 like
Comment
Katie Holzer

Join Date: Jan 2018

Posts: 65
#13

21 Feb 2020, 17:00

Wonderful, thank you!
Comment

Announcement