Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping wide to long data

    Hi!

    I am working with quite large dataset, which consists of both "fixed" variables (like ID, gender, date of birth, date of hospitalization, initial treatment, previous diseases) as well as variables containing longitudinal data (like ldl-cholesterol, blood pressure, employment status) assessed at certain time points (at baseline, after two weeks, after 3months etc). An example from the dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 Gender str19 Treatment double(d00ldl d14ldl m03ldl m06ldl)    float(d00arbeid    d14arbeid    m03arbeid    m06arbeid)
    "Mann"   "PCI-medikamentstent"   7 3.1 2.8 2.2 1 0 0 0
    "Mann"   "PCI-medikamentstent" 2.2   2 2.2 2.2 1 0 0 1
    "Mann"   "PCI-metallstent"     3.2 3.3 3.3 2.9 1 0 1 1
    "Kvinne" "PCI-metallstent"     3.3 2.4 2.3 2.2 1 1 1 1
    "Mann"   "PCI-medikamentstent" 2.4   .   .   . 1 . . .
    "Mann"   "PCI-medikamentstent" 4.7 2.3 2.1 2.2 0 0 0 0
    "Mann"   "POBA"                  3 2.1 1.8 1.7 0 0 0 0
    "Mann"   "PCI-medikamentstent" 4.1   .   .   . 0 . . .
    "Mann"   "PCI-medikamentstent" 2.6 2.9 2.4 2.6 0 0 0 0
    "Mann"   "PCI-medikamentstent" 2.3   .   .   . 1 . . .
    end
    label values d00arbeid neija
    label values d14arbeid neija
    label values m03arbeid neija
    label values m06arbeid neija
    label def neija 0 "Nei", modify
    label def neija 1 "Ja", modify
    I have tried to read other posts on how to reshape data from wide to long, but I see that there are different advices in different situations, so I decided to ask to be sure I apply the right commands. I firstly renamed all the time-dependent variables to be compatible for long format by using rename command, as in examples below:

    rename d00* *00
    rename d14* *01
    rename m06* *06

    The dataset is quite large and I wonder if there is better way to do it than using the standard command (reshape long varname, i(ID) j(time) ).

    Thanks for your attention!

  • #2
    I am unaware of better options but you could consider partitioning the data, i.e.,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 Gender str19 Treatment double(d00ldl d14ldl m03ldl m06ldl) float(d00arbeid d14arbeid m03arbeid m06arbeid)
    "Mann"   "PCI-medikamentstent"   7 3.1 2.8 2.2 1 0 0 0
    "Mann"   "PCI-medikamentstent" 2.2   2 2.2 2.2 1 0 0 1
    "Mann"   "PCI-metallstent"     3.2 3.3 3.3 2.9 1 0 1 1
    "Kvinne" "PCI-metallstent"     3.3 2.4 2.3 2.2 1 1 1 1
    "Mann"   "PCI-medikamentstent" 2.4   .   .   . 1 . . .
    "Mann"   "PCI-medikamentstent" 4.7 2.3 2.1 2.2 0 0 0 0
    "Mann"   "POBA"                  3 2.1 1.8 1.7 0 0 0 0
    "Mann"   "PCI-medikamentstent" 4.1   .   .   . 0 . . .
    "Mann"   "PCI-medikamentstent" 2.6 2.9 2.4 2.6 0 0 0 0
    "Mann"   "PCI-medikamentstent" 2.3   .   .   . 1 . . .
    end
    
    gen id = _n
    
    rename d00* *1
    rename d14* *2
    rename m03* *3
    rename m06* *4
    
    tempfile data
    save `data'
    
    use `data' if id<=5, clear
    reshape long ldl arbeid, i(id) j(tid)
    tempfile data1
    save `data1'
    
    use `data' if id>5, clear
    reshape long ldl arbeid, i(id) j(tid)
    tempfile data2
    save `data2'
    
    clear
    append using `data1'
    append using `data2'

    Comment


    • #3
      I'm not sure what you mean by a "better" alternative. With very large data sets, -reshape- can be very slow and people often look for something that will do the job faster. There are a few user-wrritten programs that can do that.

      There is the -greshape- command, which is part of Mauricio Caceres' -gtools- package, available at github.com/mcaceresb/stata-gtools.

      And there is -tolong-, by Rafal Raciborski, available from SSC.

      -greshape- will do both wide to long and long to wide reshaping. -tolong-, as its name suggests, does only wide to long. Both are much faster than Stata's native -reshape- command in large data sets. And both have syntax that is largely similar to that of -reshape-, so learning to use them is simple enough.

      Comment


      • #4
        Thanks a lot for your input, I will try your suggestions!

        Comment

        Working...
        X