Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping long to wide

    I am trying to reshape a dataset from long to wide. Initially I was getting the error that the variable I had indicated for "j" was not unique within the id (study_id). Therefore I created a new variable especially for this purpose that would be unique within STUDY_ID using the following:

    Code:
    egen newid = group(study_id)
    Despite this I am now receiving the error message that newid is not unique within study_id. Any advice? A sample of my dataset (dummy data for confidentiality reasons) is be below.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 study_id str10 aerptdat str8 aerptime str26 aeterm str10 aestdat byte(aeout aesev) str10 aeendat byte aeser str2 aeinit int aespid str10 eventstartdateheader byte(eventrepeatkey newid)
    "AA" "09/01/2023" "18:17:48" "Headache"                   "24/12/2021" 5 2 ""           1 "HW" 378 "09/01/2023"  1 1
    "AA" "09/01/2023" "18:17:48" "Headache"                   ""           1 2 "26/12/2021" 1 "HW" 378 "09/01/2023"  2 1
    "AA" "09/01/2023" "18:17:48" "Diarrhoea"                  "25/12/2021" 5 2 ""           1 "HW" 379 "09/01/2023"  3 1
    "AA" "09/01/2023" "18:17:48" "Diarrhoea"                  ""           1 2 "26/12/2021" 1 "HW" 379 "09/01/2023"  4 1
    "AA" "09/01/2023" "18:17:48" "Generally feeling unwell"   "25/12/2021" 5 2 ""           1 "HW" 377 "09/01/2023"  5 1
    "AA" "09/01/2023" "18:17:48" "Generally feeling unwell"   ""           1 2 "27/12/2021" 1 "HW" 377 "09/01/2023"  6 1
    "AA" "09/01/2023" "18:17:48" "Abdominal pain"             "25/12/2021" 5 2 ""           1 "HW" 380 "09/01/2023"  7 1
    "AA" "09/01/2023" "18:17:48" "Abdominal pain"             ""           1 2 "26/12/2021" 1 "HW" 380 "09/01/2023"  8 1
    "AA" "09/01/2023" "18:17:48" "Abdominal pain"             "29/12/2021" 5 2 ""           1 "HW" 380 "09/01/2023"  9 1
    "AA" "09/01/2023" "18:17:48" "Abdominal pain"             ""           1 2 "30/12/2021" 1 "HW" 380 "09/01/2023" 10 1
    "BB" "09/01/2023" "18:17:48" "Headache"                   "15/03/2022" 5 3 ""           1 "HW" 387 "09/01/2023"  1 2
    "BB" "09/01/2023" "18:17:48" "Headache"                   ""           1 3 "16/03/2022" 1 "HW" 387 "09/01/2023"  2 2
    "BB" "09/01/2023" "18:17:48" "Headache"                   "17/03/2022" 5 3 ""           1 "HW" 387 "09/01/2023"  3 2
    "BB" "09/01/2023" "18:17:48" "Loss of smell and or taste" "15/03/2022" 5 2 ""           1 "HW" 392 "09/01/2023"  4 2
    "BB" "09/01/2023" "18:17:48" "Loss of smell and or taste" ""           1 2 "16/03/2022" 1 "HW" 392 "09/01/2023"  5 2
    "CC" "09/01/2023" "18:17:48" "Headache"                   "16/11/2021" 5 2 ""           1 "HW" 403 "09/01/2023"  1 3
    "CC" "09/01/2023" "18:17:48" "Headache"                   ""           1 2 "21/11/2021" 1 "HW" 403 "09/01/2023"  2 3
    "CC" "09/01/2023" "18:17:48" "Headache"                   "22/11/2021" 5 2 ""           1 "HW" 403 "09/01/2023"  3 3
    "CC" "09/01/2023" "18:17:48" "Fever"                      "16/11/2021" 5 3 ""           1 "HW" 399 "09/01/2023"  4 3
    "CC" "09/01/2023" "18:17:48" "Fever"                      ""           1 3 "21/11/2021" 1 "HW" 399 "09/01/2023"  5 3
    "CC" "09/01/2023" "18:17:48" "Fever"                      "22/11/2021" 5 3 ""           1 "HW" 399 "09/01/2023"  6 3
    "CC" "09/01/2023" "18:17:48" "Loss of smell and or taste" "16/11/2021" 5 3 ""           1 "HW" 404 "09/01/2023"  7 3
    "CC" "09/01/2023" "18:17:48" "Loss of smell and or taste" ""           1 3 "21/11/2021" 1 "HW" 404 "09/01/2023"  8 3
    "CC" "09/01/2023" "18:17:48" "Loss of smell and or taste" "22/11/2021" 5 3 ""           1 "HW" 404 "09/01/2023"  9 3
    end



  • #2
    as per the request in the FAQ, what is the exact command you are giving Stata (please show in CODE blocks) along with the exact response you are getting from Stata

    Comment


    • #3
      What reshape command are you trying?

      Comment


      • #4
        Apologies for not providing this previously, the reshape command I have tried is:
        Code:
        reshape wide AERPTDAT AERPTIME AETERM AESTDAT AEOUT AESEV AEENDAT AESER AEINIT AESPID, i(STUDY_ID) j(newid)

        Comment


        • #5
          I don't understand the error report in #1. Your two identifier variables map onto each other one to one, which is the point of group() here.

          To reshape wide you need to distinguish different observations for the same identifier.

          Backing up, why you want to reshape wide here any way?

          1. Specifically, there are empty values in two date variables that are best cleaned up before any change of structure.

          2. Generally, what analyses do you have in mind that require that wide layout?

          In Stata, you're almost always better off with long layout.

          Comment


          • #6
            Hi Jenna Grabey, maybe this is what you wanted:
            Code:
            bysort study_id: gen newid=_n
            reshape wide aerptdat -eventrepeatkey , i(study_id) j(newid)
            Code:
            . reshape wide aerptdat - eventrepeatkey, i(study_id) j(newid)
            (note: j = 1 2 3 4 5 6 7 8 9 10)
            
            Data                               long   ->   wide
            -----------------------------------------------------------------------------
            Number of obs.                       24   ->       3
            Number of variables                  14   ->     121
            j variable (10 values)             newid   ->   (dropped)
            xij variables:
                                           aerptdat   ->   aerptdat1 aerptdat2 ... aerptdat10
                                           aerptime   ->   aerptime1 aerptime2 ... aerptime10
                                             aeterm   ->   aeterm1 aeterm2 ... aeterm10
                                            aestdat   ->   aestdat1 aestdat2 ... aestdat10
                                              aeout   ->   aeout1 aeout2 ... aeout10
                                              aesev   ->   aesev1 aesev2 ... aesev10
                                            aeendat   ->   aeendat1 aeendat2 ... aeendat10
                                              aeser   ->   aeser1 aeser2 ... aeser10
                                             aeinit   ->   aeinit1 aeinit2 ... aeinit10
                                             aespid   ->   aespid1 aespid2 ... aespid10
                               eventstartdateheader   ->   eventstartdateheader1 eventstartdateheader2 ... eventstartdateheader10
                                     eventrepeatkey   ->   eventrepeatkey1 eventrepeatkey2 ... eventrepeatkey10
            -----------------------------------------------------------------------------

            Comment


            • #7
              Try
              Code:
              . reshape wide  aerptdat aerptime aeterm aestdat aeout aesev aeendat aeser aeinit aespid, i(study_id) j(eventrepeatkey)
              (j = 1 2 3 4 5 6 7 8 9 10)
              
              Data                               Long   ->   Wide
              -----------------------------------------------------------------------------
              Number of observations               24   ->   3          
              Number of variables                  14   ->   103        
              j variable (10 values)   eventrepeatkey   ->   (dropped)
              xij variables:
                                             aerptdat   ->   aerptdat1 aerptdat2 ... aerptdat10
                                             aerptime   ->   aerptime1 aerptime2 ... aerptime10
                                               aeterm   ->   aeterm1 aeterm2 ... aeterm10
                                              aestdat   ->   aestdat1 aestdat2 ... aestdat10
                                                aeout   ->   aeout1 aeout2 ... aeout10
                                                aesev   ->   aesev1 aesev2 ... aesev10
                                              aeendat   ->   aeendat1 aeendat2 ... aeendat10
                                                aeser   ->   aeser1 aeser2 ... aeser10
                                               aeinit   ->   aeinit1 aeinit2 ... aeinit10
                                               aespid   ->   aespid1 aespid2 ... aespid10
              -----------------------------------------------------------------------------
              Last edited by Hemanshu Kumar; 10 Jan 2023, 07:47.

              Comment


              • #8
                That's why I'm so confused that it does not work. I want to use wide layout as this is what I need to upload my data to our clinical database. In order to do this there should be one row per study_id. The date variables where it looks like some are missing they are not always missing the AESTDAT - the start date and the AEENDAT is the end date of the event/symptoms.

                Comment


                • #9
                  In reshape wide, i() is study_id which denote an observation (for example, AA), and j() is a variable which denote every subobservation (for example, AA recorded/observed in time 1, AA recorded/observed in time 2 ...). Stata warns that the newid you create "is not unique within study_id", so make sure it is unique within study_id. Hemanshu Kumar is right, the variable -eventrepeatkey- is just the j().

                  Comment

                  Working...
                  X