Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficulty reorganizing my data into a wide and condensed format: reshape or another method of organization?

    Hello everyone,

    Thank you in advance for your guidance and advice. I am a new Stata user working with a large medical database. I have data where each patient has an "inc_key" as their unique identifier. This subset of data provides multiple observations for each inc_key to provide an injury score, description, and body region. I want to have in the end one observation with the injury score for each body region for that one specific inc_key (i.e. the variables would be inc_key, aisseverity of issregion __, aisseverity of issregion ___, etc. with the values of each "aisseverity ofissregion__" being the average aisseverity score for that body region). There are a total of 6 issregions.

    My original, and naive, approach was to use "reshape wide aisseverity, i(inc_key) j(issregion)" but this failed, obviously, because inc_key was not providing unique results (i.e. one inc_key could have multiple observations with the same issregion due to varying aispredot values).

    My idea now was to try and have Stata create a mean aisseverity value by issregion for each inc_key. However I am having trouble telling Stata to do just that. Also, I am worried I do not know how to have it store that mean in a specific new data frame.

    I have attached below a subset of my data for reference.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double inc_key str8 aisversion long aispredot byte aisseverity strL aisdescription byte issregion
    220066260699 "AIS 1998" 110202 1 "Scalp, Abrasion"                                                 6
    220069795312 "AIS 1998" 110202 1 "Scalp, Abrasion"                                                 6
    220072298312 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220069860051 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220072298525 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220064485938 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220067924505 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220072298488 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220064507567 "AIS 1998" 110402 1 "Scalp, Contusion (includes subgaleal hematoma)"                  6
    220064632104 "AIS 1998" 110600 1 "Scalp, Laceration, NFS"                                          6
    220064611195 "AIS 1998" 110600 1 "Scalp, Laceration, NFS"                                          6
    220068517163 "AIS 1998" 110600 1 "Scalp, Laceration, NFS"                                          6
    220072298381 "AIS 1998" 110600 1 "Scalp, Laceration, NFS"                                          6
    220066315040 "AIS 1998" 110602 1 "Scalp, Laceration, Minor"                                        6
    220072298548 "AIS 1998" 110602 1 "Scalp, Laceration, Minor"                                        6
    220072298350 "AIS 1998" 110604 2 "Scalp, Laceration, Major (> 10 cm long and subcateneous tissue)" 6
    220064507353 "AIS 1998" 115099 9 "Closed Head Injury, NFS"                                         1
    220069746308 "AIS 1998" 121002 5 "Internal Carotid Artery, Laceration"                             1
    220069716757 "AIS 1998" 122804 3 "Vertebral Artery, Thrombosis"                                    1
    220068276145 "AIS 1998" 140202 5 "Brain Stem, Compression"                                         1
    220068517100 "AIS 1998" 140418 4 "Cerebellum, Hematoma Epi/Extradural, Small"                      1
    220068517254 "AIS 1998" 140442 4 "Cerebellum, Hematoma, Subdural, Small"                           1
    220064632738 "AIS 1998" 140602 3 "Cerebrum, Contusion, NFS"                                        1
    220068517100 "AIS 1998" 140604 3 "Cerebrum, Contusion, Single, NFS"                                1
    220069746308 "AIS 1998" 140624 4 "Cerebrum, Contusions, Mult, Bilat, Large"                        1
    220068517061 "AIS 1998" 140629 4 "Cerebrum, Hematoma, NFS"                                         1
    220066261210 "AIS 1998" 140632 4 "Cerebrum, Hematoma, Epi/Extradural, Small"                       1
    220070098047 "AIS 1998" 140632 4 "Cerebrum, Hematoma, Epi/Extradural, Small"                       1
    220064632738 "AIS 1998" 140636 5 "Cerebrum, Hematoma, Epi/Extradural, Large"                       1
    220069716757 "AIS 1998" 140638 4 "Cerebrum, Hematoma, Intracerebral, NFS"                          1
    220066260699 "AIS 1998" 140640 4 "Cerebrum, Hematoma, Intracerebral, Small"                        1
    220068276145 "AIS 1998" 140648 5 "Cerebrum, Hematoma, Intracerebral, Large"                        1
    220068276370 "AIS 1998" 140652 4 "Cerebrum, Hematoma, Subdural, Small, Unilateral"                 1
    220068276145 "AIS 1998" 140652 4 "Cerebrum, Hematoma, Subdural, Small, Unilateral"                 1
    220066260699 "AIS 1998" 140660 3 "Cerebrum, Brain Swelling, NFS"                                   1
    220072298252 "AIS 1998" 140662 3 "Cerebrum, Brain Swelling, Mild"                                  1
    220069716757 "AIS 1998" 140662 3 "Cerebrum, Brain Swelling, Mild"                                  1
    220069864323 "AIS 1998" 140678 4 "Cerebrum, Intraventricular Hemorrhage"                           1
    220069860051 "AIS 1998" 140678 4 "Cerebrum, Intraventricular Hemorrhage"                           1
    220068266891 "AIS 1998" 140678 4 "Cerebrum, Intraventricular Hemorrhage"                           1
    220069716757 "AIS 1998" 140684 3 "Cerebrum, Subarachnoid Hemorrhage"                               1
    end
    label values issregion issReg2
    label def issReg2 1 "Head or neck", modify
    label def issReg2 6 "External", modify

  • #2
    You can do this with
    Code:
    sort inc_key, stable
    by inc_key: gen _j = _n
    reshape wide ais* issregion, i(inc_key) j(_j)
    format inc_key %12.0f
    But why do you want to do this? The resulting wide data layout is not very useful in Stata. The vast majority of data management and analysis commands in Stata will work better, or only, with your original long layout. Unless you specifically know of something you will be doing with your data that requires a wide layout, you should leave things as they are.

    Comment


    • #3
      Thank you Mr. Schechter for this feedback. The suggested method works, but there is there a way to now reorganize the data so that each body region has its own column?

      The reason we planned to do this is because we have a main database with 1 observation (row) per inc_key to represent data from a singular patient's hospital visit. To merge the data together we thought our best option would be to reshape this dataset into a format that matches that. Is there a better way to go about this? We greatly appreciate your expertise!

      Comment


      • #4
        but there is there a way to now reorganize the data so that each body region has its own column?
        I'm having trouble making sense of this. Can you create a small fake data set that shows what this would look like?

        The reason we planned to do this is because we have a main database with 1 observation (row) per inc_key to represent data from a singular patient's hospital visit. To merge the data together we thought our best option would be to reshape this dataset into a format that matches that.
        Doesn't the original data already have one observation per single patient's hospital visit?
        Last edited by Clyde Schechter; 09 Apr 2024, 14:39.

        Comment


        • #5
          Of course! We realize that averaging all of the aisseverity scores for a specific region per inc_key is required in order to get the following data table structure as our end goal. The averages would be the values for columns 2-7.

          Essentially we have a Main database and this Subset database (one shown briefly in the original post). It might be easier to reference them this way. The Subset database provided in the original post ends up having the same inc_key in multiple observations(rows). The Main database we want to merge the Subset database to later is the one that has one observation per single patient's hospital visit. I apologize for the miscommunication.
          inc_key issregion 1 (head and neck) issregion 2 (face) issregion 3 (chest) issregion 4 (abdomen) issregion 5 (extremities) issregion 5 (external)
          1234 1 2 2 4 3 1
          4321 4 2 3 1 0 0
          Last edited by Zoltan Nemeth; 09 Apr 2024, 14:52.

          Comment


          • #6
            OK, it is impossible to get exactly what you show. The problem is that variable names in Stata cannot contain embedded blanks, nor special characters. What you can have is a layout similar to this, but the variables will jsutbe named issregion1, issregion2, etc., and the names of the regions can be assigned as variable labels. So, like this:
            Code:
            levelsof issregion, local(regions)
            tempfile labeler
            label save issReg2 using `labeler'
            collapse (mean) aisseverity, by(inc_key issregion)
            reshape wide aisseverity, i(inc_key) j(issregion)
            rename aisseverity* aisseverity_issregion*
            run `labeler'
            foreach r of local regions {
                label var aisseverity_issregion`r' "`:label issReg2 `r''"
            }
            format inc_key %12.0f
            Moreover, I think it is not a good programming practice to name these variables issregion1, etc. The reason is that the values they contain are not region numbers, they contain aisseverity scores within those regions. So this code names them aisseverity_issregion1, etc. That way, the name says what it is.

            Comment


            • #7
              Thank you so much for your help! This helped us out tremendously. The labels were temporary for our variables and we agree that more descriptive names are more appropriate. Thank you once again.

              Comment

              Working...
              X