Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long to wide creates too many variables

    My long dataset has between 1 and 3 observations on about 100,000 unique individuals. There is a unique identifier for each individual that I used as the i variable. The j variable has 35 values, but for any given individual there are no more than 3 of these values. Nevertheless, the long to wide command creates 35 new variables for each varying observation for each individual. Most of the new variables are blank for any given individual. The j variables are numeric dates.
    How can I transform the data so that only the relevant new variables are in the dataset?
    Thanks for any advice.
    John

  • #2
    Hello John. I see that was your first post, so welcome to the forum. I think it would help other members to help you if you provided a small dataset to illustrate the problem. See the FAQ, especially item 12, for some guidance on how to do that. Meanwhile, I wonder if you would get a wide dataset more to your liking if you created a sequential record number (with values 1-3) to use as the j variable. I'm imagining a final wide file that has the ID variable, v1, v2, and v3 for each variable of interest, plus date1, date2 and date3 (assuming those dates might be needed for whatever you want to do with the wide file). HTH.
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Bruce, many thanks for the quick reply. You are correct about what I would like the result to be. Any advice on how to create a sequential record number for each individual? Here is a sample of the data, with identifiers such as names and other variables removed. Note that each request number (second column) is associated with a unique date, so either could be used to create a sequential record number for each ID. I left out the 4 stub variables.
      John

      Unique ID req# date as string date as number
      100036452 1211 "9/26/20 21:57" 22184
      100086315 1211 "9/26/20 21:57" 22184
      100238008 1254 "10/7/20 1:59" 22195
      100238898 1211 "9/26/20 21:57" 22184
      100259426 1253 "10/6/20 23:31" 22194
      100270412 1174 "9/23/20 16:00" 22181
      100273396 1040 "9/11/20 13:15" 22169
      100280965 1174 "9/23/20 16:00" 22181
      100288906 1211 "9/26/20 21:57" 22184
      100288906 1254 "10/7/20 1:59" 22195
      100299427 1174 "9/23/20 16:00" 22181
      100307168 1211 "9/26/20 21:57" 22184
      100307168 1253 "10/6/20 23:31" 22194
      100308025 1254 "10/7/20 1:59" 22195
      100310434 1040 "9/11/20 13:15" 22169
      100310434 1254 "10/7/20 1:59" 22195
      100313828 1211 "9/26/20 21:57" 22184
      100316484 1040 "9/11/20 13:15" 22169
      100322169 1254 "10/7/20 1:59" 22195
      100325005 1211 "9/26/20 21:57" 22184
      100325005 1254 "10/7/20 1:59" 22195
      100338914 1040 "9/11/20 13:15" 22169
      100341133 1211 "9/26/20 21:57" 22184
      100421033 1254 "10/7/20 1:59" 22195


      Comment


      • #4
        You can create a variable specifying the temporal order. What do you want to do with the data in wide layout? A long layout is almost always preferred for analysis.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double UniqueID float req str30 strdate float date
        100036452 1211 "9/26/20 21:57" 22184
        100086315 1211 "9/26/20 21:57" 22184
        100238008 1254 "10/7/20 1:59"  22195
        100238898 1211 "9/26/20 21:57" 22184
        100259426 1253 "10/6/20 23:31" 22194
        100270412 1174 "9/23/20 16:00" 22181
        100273396 1040 "9/11/20 13:15" 22169
        100280965 1174 "9/23/20 16:00" 22181
        100288906 1211 "9/26/20 21:57" 22184
        100288906 1254 "10/7/20 1:59"  22195
        100299427 1174 "9/23/20 16:00" 22181
        100307168 1211 "9/26/20 21:57" 22184
        100307168 1253 "10/6/20 23:31" 22194
        100308025 1254 "10/7/20 1:59"  22195
        100310434 1040 "9/11/20 13:15" 22169
        100310434 1254 "10/7/20 1:59"  22195
        100313828 1211 "9/26/20 21:57" 22184
        100316484 1040 "9/11/20 13:15" 22169
        100322169 1254 "10/7/20 1:59"  22195
        100325005 1211 "9/26/20 21:57" 22184
        100325005 1254 "10/7/20 1:59"  22195
        100338914 1040 "9/11/20 13:15" 22169
        100341133 1211 "9/26/20 21:57" 22184
        100421033 1254 "10/7/20 1:59"  22195
        end
        
        bys UniqueID (date): gen which=_n
        reshape wide req strdate date , i(UniqueID) j(which)
        Res.:

        Code:
        . l, sep(0)
        
             +-------------------------------------------------------------------------+
             |  UniqueID   req1        strdate1   date1   req2        strdate2   date2 |
             |-------------------------------------------------------------------------|
          1. | 1.000e+08   1211   9/26/20 21:57   22184      .                       . |
          2. | 1.001e+08   1211   9/26/20 21:57   22184      .                       . |
          3. | 1.002e+08   1254    10/7/20 1:59   22195      .                       . |
          4. | 1.002e+08   1211   9/26/20 21:57   22184      .                       . |
          5. | 1.003e+08   1253   10/6/20 23:31   22194      .                       . |
          6. | 1.003e+08   1174   9/23/20 16:00   22181      .                       . |
          7. | 1.003e+08   1040   9/11/20 13:15   22169      .                       . |
          8. | 1.003e+08   1174   9/23/20 16:00   22181      .                       . |
          9. | 1.003e+08   1211   9/26/20 21:57   22184   1254    10/7/20 1:59   22195 |
         10. | 1.003e+08   1174   9/23/20 16:00   22181      .                       . |
         11. | 1.003e+08   1211   9/26/20 21:57   22184   1253   10/6/20 23:31   22194 |
         12. | 1.003e+08   1254    10/7/20 1:59   22195      .                       . |
         13. | 1.003e+08   1040   9/11/20 13:15   22169   1254    10/7/20 1:59   22195 |
         14. | 1.003e+08   1211   9/26/20 21:57   22184      .                       . |
         15. | 1.003e+08   1040   9/11/20 13:15   22169      .                       . |
         16. | 1.003e+08   1254    10/7/20 1:59   22195      .                       . |
         17. | 1.003e+08   1211   9/26/20 21:57   22184   1254    10/7/20 1:59   22195 |
         18. | 1.003e+08   1040   9/11/20 13:15   22169      .                       . |
         19. | 1.003e+08   1211   9/26/20 21:57   22184      .                       . |
         20. | 1.004e+08   1254    10/7/20 1:59   22195      .                       . |
             +-------------------------------------------------------------------------+
        
        .

        Comment


        • #5
          Andrew, that is a beautiful thing! I am going to use the wide file for analysis as we did that previously for a similar dataset.
          Thanks for your help.
          John

          Comment

          Working...
          X