Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sort the values of a variable within one observation while retaining the variable name?

    Hello,

    I am working with a data set that looks like this:
    ID Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9
    1 20.2 14.6 37.0 82.7 16.0 82.1 71.1 76.2 78.0
    2 12.1 6.7 35.0 76.7 5.0 90.2 93.3 76.7 75.0
    3 72.8 22.0 45.6 91.8 25.6 96.1 88.1 96.0 85.8

    Each row represents one school's response to a survey, with each column representing that school's response to each survey question.

    I need to generate new variables that show the survey question that received the highest score and the value of that survey question for each school. Furthermore, I need to generate new variables that show the survey question that received the second highest score and the value of that survey question for each school. To clarify, I'd like to be able to create the table below:
    ID Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Highest Item Highest Item Value 2nd Highest Item 2nd Highest Item Value Lowest Item Lowest Item Value 2nd Lowest Item 2nd Lowest Item Value
    1 20.2 14.6 37.0 82.7 16.0 82.1 71.1 76.2 78.0 Item4 82.7 Item6 82.1 Item2 14.6 Item5 16.0
    2 12.1 6.7 35.0 76.7 5.0 90.2 93.3 76.7 75.0 Item7 93.3 Item6 90.2 Item5 5.0 Item2 6.7
    3 72.8 22.0 45.6 91.8 25.6 96.1 88.1 96.0 85.8 Item6 96.1 Item8 96.0 Item2 22.0 Item5 25.6

    I searched through previous Statalist postings and on some other websites to try to find the best way to go about doing this. One source suggested using reshape then sort then reshape again. The resulting data set would have all of the items for each school organized from lowest score to highest score, but I would not have any way of identifying which item corresponds to each score.

    Thank you for any guidance you can provide!
    Best,
    Anne

  • #2
    Anne,

    Using two reshape commands is right, but you need a bit more than that:
    Code:
    reshape long Item, i(ID) j(item_no)
    
    bysort ID (Item): gen lowest=Item[1]
    bysort ID (Item): gen which_is_lowest="Item"+string(item_no[1])
    
    bysort ID (Item): gen second_lowest=Item[2]
    bysort ID (Item): gen which_is_2nd_lowest="Item"+string(item_no[2])
    
    bysort ID (Item): gen highest=Item[_N]
    bysort ID (Item): gen which_is_highest="Item"+string(item_no[_N])
    
    bysort ID (Item): gen second_highest=Item[_N-1]
    bysort ID (Item): gen which_is_2nd_highest="Item"+string(item_no[_N-1])
    
    reshape wide Item, i(ID) j(item_no)
    This can be shortened and generalized to:

    Code:
    reshape long Item, i(ID) j(item_no)
    
    forvalues x=1/9 {
      bysort ID (Item): gen rank`x'=Item[`x']
      bysort ID (Item): gen which`x'="Item"+string(item_no[`x'])
    }
    
    reshape wide Item, i(ID) j(item_no)
    rank1 is the lowest, rank2 is the second lowest, rank8 is the second highest, and rank9 is the highest. The rest can be dropped if desired.

    Regards,
    Joe

    Comment


    • #3
      search rowsort (SJ)

      Comment


      • #4
        To expand on my earlier remark (typed on a mobile while away from base):

        There is a review of working rowwise in

        SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
        Q1/09 SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

        which is accessible at http://www.stata-journal.com/sjpdf.h...iclenum=pr0046

        Comment


        • #5
          Thank you, Joe and Nick. Your posts were helpful!

          I have two additional questions:

          1. When I ran the code to tag the lowest and second lowest items, the tagging worked well. However, when I ran this code for the highest and next highest items:

          bysort ID (Item): gen highest=Item[_N] bysort ID (Item): gen which_is_highest="Item"+string(item_no[_N]) bysort ID (Item): gen second_highest=Item[_N-1] bysort ID (Item): gen which_is_2nd_highest="Item"+string(item_no[_N-1]) ...the "highest" variable ended up being missing for all observations. For all of my respondents, there are some survey questions (variables) with missing data. I know Stata typically considers missing data to be larger than any nonmissing data, but is there a command I could use in this case to get Stata to exclude the missing values from its designation of the highest item?

          2. I read up on the rowranks and rowsort commands in The Stata Journal article that you referenced. With rowranks, in the case of a tie, is there a way to get Stata to assign the rank to one variable, assigning a blank rank to all of the other variables with the same value?

          Basically, I would want the ranks to look like this:
          Observation Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10
          Obs1 81 90 90 70 90 65 100 95 81 98
          Rank 3 4 2 1 7 5 6
          Thank you again for your help!
          Anne

          Comment


          • #6
            Anne,

            After I posted, I wondered if that issue would bite. I think you can solve it by dropping missing values after the first reshape:

            Code:
            reshape long Item, i(ID) j(item_no)
            
            drop if mi(Item)
            
            ....
            
            reshape wide Item, i(ID) j(item_no)
            This will eliminate missings from consideration as possible maximums, but when the second reshape happens, the missings will be restored. rowsort has it's own way of dealing with this issue. Regards, Joe

            Comment


            • #7
              rowranks (SJ) does not support the idiosyncratic ranking you want, which consists of ranking only distinct marks.

              Comment


              • #8
                Thank you!

                Comment

                Working...
                X