Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use duplicate observations to create variable

    Hi,

    I am very new to stata (few weeks in) and have come across a difficulty. (I have read a lot of forums and the stat help document looking for a simple answer but yet to find one)

    I have a data set with 1900 observations and 20 variables.
    Im looking at supplementation of people. Unfortunately the dataset was created with each ID having multiple Observations (up to 10) for each supplement reported instead of 10 variables.
    (Background info: Supplements have been given a code i.e. iron supplement = 2, Multivitamins = 8 etc)

    Eg of a section of the data set:

    ID Gender Age Supplement Code
    1 1 26 2
    1 1 26 4
    1 1 26 8
    1 1 26 1
    2 0 27 3
    2 0 27 6
    2 0 27 2
    3 1 23 7
    3 1 23 6
    3 1 23 1
    4 0 29 3
    4 0 29 2

    So from this example you can see there are multiple ID, gender and Age (which remain constant for each observation) however the supplement codes change as they have reported more than one supplement)

    Ideally I would like:


    ID Gender Age Supplement 1 Supplement 2 Supplement 3 Supplement 4
    1 1 26 2 4 8 1
    2 0 27 3 6 2 .
    3 1 23 7 6 1 .
    4 0 29 3 2 . .



    Where there is the one ID, gender and Age observation and then for each supplement that was listed as an observation, it is now a new variable ( so for some participants that had listed 10 supplements they wont have 10 observations but there will be 10 variables in total)

    I do hope this makes sense.

    Many thanks in advance!

    Elle


  • #2
    This is a very straightforward application of the -reshape- command. All you need is a variable that sequences the observations within ID::

    Code:
    by ID, sort: gen seq = _n
    reshape wide Supplement, i(ID) j(seq)
    and you will have precisely what you want.

    That said, you will likely come to regret doing that. The way your data came to you is actually the optimal layout for most analyses in Stata. The wide layout that you are looking for is spreadsheet-think. It's easy for human eyes to read. And there are a few things in Stata that are facilitated in wide layout, such as certain types of graphs, and a few specific commands. But Stata's analytic commands are largely oriented towards working with data in long layout. So my advice is to simply skip over this question and move on to tackling your analyses with the data you were given. If you do this -reshape wide-, I'll wager that your next post here will be trying to figure out how to carry out some analysis, and the response will be that it can't be done in wide layout and the first step is to go back to long!

    That said, do read the manual section on the -reshape- command. It is one of Stata's most useful and indispensible data management commands. While it is more often used to convert wide data to long, if you are going to be using Stata regularly going forward, you need to become comfortable with it in both directions. The -i() j()- conceptualization of the data model takes some getting used to. We all struggle with it at first. Then one day, it suddenly just sinks in and it is easy and obvious from that point forward.
    Last edited by Clyde Schechter; 02 May 2017, 20:07.

    Comment


    • #3
      Thank you so much this has been very helpful!

      Comment

      Working...
      X