Long to wide creates too many variables

John Loewenstein

Join Date: Mar 2022

Posts: 3
#1

Long to wide creates too many variables

25 Mar 2022, 08:51

My long dataset has between 1 and 3 observations on about 100,000 unique individuals. There is a unique identifier for each individual that I used as the i variable. The j variable has 35 values, but for any given individual there are no more than 3 of these values. Nevertheless, the long to wide command creates 35 new variables for each varying observation for each individual. Most of the new variables are blank for any given individual. The j variables are numeric dates.
How can I transform the data so that only the relevant new variables are in the dataset?
Thanks for any advice.
John
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1133
#2

25 Mar 2022, 09:07

Hello John. I see that was your first post, so welcome to the forum. I think it would help other members to help you if you provided a small dataset to illustrate the problem. See the FAQ, especially item 12, for some guidance on how to do that.
https://www.statalist.org/forums/help

Meanwhile, I wonder if you would get a wide dataset more to your liking if you created a sequential record number (with values 1-3) to use as the j variable. I'm imagining a final wide file that has the ID variable, v1, v2, and v3 for each variable of interest, plus date1, date2 and date3 (assuming those dates might be needed for whatever you want to do with the wide file). HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
John Loewenstein

Join Date: Mar 2022

Posts: 3
#3

25 Mar 2022, 09:44

Bruce, many thanks for the quick reply. You are correct about what I would like the result to be. Any advice on how to create a sequential record number for each individual? Here is a sample of the data, with identifiers such as names and other variables removed. Note that each request number (second column) is associated with a unique date, so either could be used to create a sequential record number for each ID. I left out the 4 stub variables.
John

Unique ID req# date as string date as number
100036452 1211 "9/26/20 21:57" 22184
100086315 1211 "9/26/20 21:57" 22184
100238008 1254 "10/7/20 1:59" 22195
100238898 1211 "9/26/20 21:57" 22184
100259426 1253 "10/6/20 23:31" 22194
100270412 1174 "9/23/20 16:00" 22181
100273396 1040 "9/11/20 13:15" 22169
100280965 1174 "9/23/20 16:00" 22181
100288906 1211 "9/26/20 21:57" 22184
100288906 1254 "10/7/20 1:59" 22195
100299427 1174 "9/23/20 16:00" 22181
100307168 1211 "9/26/20 21:57" 22184
100307168 1253 "10/6/20 23:31" 22194
100308025 1254 "10/7/20 1:59" 22195
100310434 1040 "9/11/20 13:15" 22169
100310434 1254 "10/7/20 1:59" 22195
100313828 1211 "9/26/20 21:57" 22184
100316484 1040 "9/11/20 13:15" 22169
100322169 1254 "10/7/20 1:59" 22195
100325005 1211 "9/26/20 21:57" 22184
100325005 1254 "10/7/20 1:59" 22195
100338914 1040 "9/11/20 13:15" 22169
100341133 1211 "9/26/20 21:57" 22184
100421033 1254 "10/7/20 1:59" 22195
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

25 Mar 2022, 10:12

You can create a variable specifying the temporal order. What do you want to do with the data in wide layout? A long layout is almost always preferred for analysis.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double UniqueID float req str30 strdate float date
100036452 1211 "9/26/20 21:57" 22184
100086315 1211 "9/26/20 21:57" 22184
100238008 1254 "10/7/20 1:59"  22195
100238898 1211 "9/26/20 21:57" 22184
100259426 1253 "10/6/20 23:31" 22194
100270412 1174 "9/23/20 16:00" 22181
100273396 1040 "9/11/20 13:15" 22169
100280965 1174 "9/23/20 16:00" 22181
100288906 1211 "9/26/20 21:57" 22184
100288906 1254 "10/7/20 1:59"  22195
100299427 1174 "9/23/20 16:00" 22181
100307168 1211 "9/26/20 21:57" 22184
100307168 1253 "10/6/20 23:31" 22194
100308025 1254 "10/7/20 1:59"  22195
100310434 1040 "9/11/20 13:15" 22169
100310434 1254 "10/7/20 1:59"  22195
100313828 1211 "9/26/20 21:57" 22184
100316484 1040 "9/11/20 13:15" 22169
100322169 1254 "10/7/20 1:59"  22195
100325005 1211 "9/26/20 21:57" 22184
100325005 1254 "10/7/20 1:59"  22195
100338914 1040 "9/11/20 13:15" 22169
100341133 1211 "9/26/20 21:57" 22184
100421033 1254 "10/7/20 1:59"  22195
end

bys UniqueID (date): gen which=_n
reshape wide req strdate date , i(UniqueID) j(which)

Res.:

Code:

. l, sep(0)

     +-------------------------------------------------------------------------+
     |  UniqueID   req1        strdate1   date1   req2        strdate2   date2 |
     |-------------------------------------------------------------------------|
  1. | 1.000e+08   1211   9/26/20 21:57   22184      .                       . |
  2. | 1.001e+08   1211   9/26/20 21:57   22184      .                       . |
  3. | 1.002e+08   1254    10/7/20 1:59   22195      .                       . |
  4. | 1.002e+08   1211   9/26/20 21:57   22184      .                       . |
  5. | 1.003e+08   1253   10/6/20 23:31   22194      .                       . |
  6. | 1.003e+08   1174   9/23/20 16:00   22181      .                       . |
  7. | 1.003e+08   1040   9/11/20 13:15   22169      .                       . |
  8. | 1.003e+08   1174   9/23/20 16:00   22181      .                       . |
  9. | 1.003e+08   1211   9/26/20 21:57   22184   1254    10/7/20 1:59   22195 |
 10. | 1.003e+08   1174   9/23/20 16:00   22181      .                       . |
 11. | 1.003e+08   1211   9/26/20 21:57   22184   1253   10/6/20 23:31   22194 |
 12. | 1.003e+08   1254    10/7/20 1:59   22195      .                       . |
 13. | 1.003e+08   1040   9/11/20 13:15   22169   1254    10/7/20 1:59   22195 |
 14. | 1.003e+08   1211   9/26/20 21:57   22184      .                       . |
 15. | 1.003e+08   1040   9/11/20 13:15   22169      .                       . |
 16. | 1.003e+08   1254    10/7/20 1:59   22195      .                       . |
 17. | 1.003e+08   1211   9/26/20 21:57   22184   1254    10/7/20 1:59   22195 |
 18. | 1.003e+08   1040   9/11/20 13:15   22169      .                       . |
 19. | 1.003e+08   1211   9/26/20 21:57   22184      .                       . |
 20. | 1.004e+08   1254    10/7/20 1:59   22195      .                       . |
     +-------------------------------------------------------------------------+

.

Comment

John Loewenstein

Join Date: Mar 2022

Posts: 3
#5

25 Mar 2022, 12:12

Andrew, that is a beautiful thing! I am going to use the wide file for analysis as we did that previously for a similar dataset.
Thanks for your help.
John
Comment

Announcement

Long to wide creates too many variables

Comment

Comment

Comment

Comment