Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicate prescriptions

    I am using Stata 16 to analyze a dataset with prescription information. Each person has multiple prescriptions and their unique ID shows up for each prescription. I would like to remove duplicates, in that I would like to create columns for prescription 1, prescription 2, etc. instead of having a new sample ID for each prescription. I believe this may be changing from long form to short form, but that terminology may be wrong.
    I appreciate any guidance and apologize if anything is unclear.

  • #2
    I recommend to use - dataex - to provide a clear display of the data.

    Best regards,

    Marcos

    Comment


    • #3
      You are correct Katie: you need to transform your dataset to a wide format. The command should be " reshape wide". Look at the help section for more guidance.

      Comment


      • #4
        Katie:
        Others provided helpful insights.
        That said, my take on that is a bit different: multiple observations for the same IDs are usually better analyzed in -long- format (hence, I would not -reshape-).
        As medication possibly differ as they may target different diseases that affect the same patient, you can add a categorical variable (one level for therapeutic area) and -label- it as convenient.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you all. When I attempted to reshape, I get the error: Data are already wide. Any advice for how to continue?

          Comment


          • #6
            The suggestion from post #2 is pertinent to solving the problem in post #5. If you cannot show us what your data are like, and cannot tell us the exact command you used, we can only guess what you did wrong.

            Comment


            • #7
              Thank you, William. I apologize. I attempted to use dataex and got an error to specify fewer variables. I specified by ID variable and the result was "input long IDvar" followed by 100 ID variables. Unfortunately my sample ID # are identifiable and I cannot share them. (DOC #s).

              The reshape command I used was: reshape long DrugName, i(DOCID) j(time)

              Comment


              • #8
                You did not get the error "Data are already wide" as a result of issuing a command "reshape long ...".

                Comment


                • #9
                  I'm sorry, I appreciate your patience. I was experimenting and forgot I changed the code. I created a new unique ID so that I could share my dataex results. Here it is below. Basically each person has multiple prescriptions. I would like instead of duplicate IDs to have multiple drugname variables (drugname1, drugname2) for each person.

                  input long customerid str38 drugname
                  8769212 "DULOXETINE HCL DR 60MG CAP"
                  8769211 "HYDROXYZINE PAMOATE 25MG CAP"
                  8766843 "LEVOTHYROXINE 150MCG TAB"
                  8769211 "MIRTAZAPINE 45MG TAB"
                  8766843 "RANITIDINE 150MG TAB"
                  8769364 "ALLOPURINOL 100MG TAB"
                  8769360 "AMLODIPINE 10MG TAB"
                  8769348 "AMLODIPINE 10MG TAB"
                  8769344 "AMLODIPINE 10MG TAB"
                  8769316 "AMLODIPINE 5MG TAB"
                  8769348 "ATORVASTATIN CALCIUM 10MG TAB"
                  8767105 "ATORVASTATIN CALCIUM 20MG TAB"
                  8769324 "ATORVASTATIN CALCIUM 20MG TAB"
                  8769316 "ATORVASTATIN CALCIUM 20MG TAB"

                  Comment


                  • #10
                    Here is sample code that does what you ask, I believe.
                    Code:
                    . sort customerid, stable
                    
                    . by customerid: generate pnum = _n
                    
                    . reshape wide drugname, i(customerid) j(pnum)
                    (note: j = 1 2)
                    
                    Data                               long   ->   wide
                    -----------------------------------------------------------------------------
                    Number of obs.                       14   ->      10
                    Number of variables                   3   ->       3
                    j variable (2 values)              pnum   ->   (dropped)
                    xij variables:
                                                   drugname   ->   drugname1 drugname2
                    -----------------------------------------------------------------------------
                    
                    . list, clean noobs
                    
                      customerid                       drugname1                       drugname2  
                         8766843        LEVOTHYROXINE 150MCG TAB            RANITIDINE 150MG TAB  
                         8767105   ATORVASTATIN CALCIUM 20MG TAB                                  
                         8769211    HYDROXYZINE PAMOATE 25MG CAP            MIRTAZAPINE 45MG TAB  
                         8769212      DULOXETINE HCL DR 60MG CAP                                  
                         8769316              AMLODIPINE 5MG TAB   ATORVASTATIN CALCIUM 20MG TAB  
                         8769324   ATORVASTATIN CALCIUM 20MG TAB                                  
                         8769344             AMLODIPINE 10MG TAB                                  
                         8769348             AMLODIPINE 10MG TAB   ATORVASTATIN CALCIUM 10MG TAB  
                         8769360             AMLODIPINE 10MG TAB                                  
                         8769364           ALLOPURINOL 100MG TAB
                    With that said, the experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. You should try to achieve what you need with the data organized as it currently is, and seek the help of Statalist in doing so. The sort of problems you will encounter trying to use your reshaped data will almost certainly be solved by reshaping the data. It is much easier, for example, to determine who received a particular drub by searching a a in single variable on multiple observations for each cutomerid, than it is to search in multiple variables on a single observation for each customerid.

                    Comment


                    • #11
                      I'm sure I don't completely understand the issues with reshaping in Stata and I appreciate your warning. I am running VERY simple descriptives. For example, if I want to know the number of females in the dataset and I tab the gender variable, the count will include the duplicate observations, correct?

                      Comment


                      • #12
                        I am running VERY simple descriptives. For example, if I want to know the number of females in the dataset and I tab the gender variable, the count will include the duplicate observations, correct?
                        Correct, but there is a very simple workaround for that:
                        Code:
                        egen flag = tag(customerid)
                        tab gender if flag

                        Comment


                        • #13
                          Wonderful, thank you!

                          Comment

                          Working...
                          X