Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with reshape

    Hello,

    I'm having problem with reshaping the data. I'm using the following code:

    Code:
    reshape wide time_spent1, i(ind_id weight gender age education ups NIC2008_2d marital_status relationship sector state1 statecode dist1 distcode hh_size religion caste land usual_monthly_consumer_exp pse_cooking pse_lighting washing_clothes_type sweeping_floor_type structure_dwelling_type c18 month day_of_week type_of_the_day) j(act_1d)
    The output says: too many variables specified. How should I go about it?

  • #2
    You need to specify fewer variables. In a long layout dataset identifiers are usually just an individual identifier and some date or time.

    But to get more precise advice you should please give us a data example.

    I'd say that 80-90% of the time that people ask on Statalist about reshape wide, it turns that they don't have a clear strategy for how they will deal with the reshaped data or it's not a good idea any way. Most analyses in Stata are easier with long layout and some are impossible otherwise.

    Comment


    • #3
      Hello, Nick. An example dataset (I have specified fewer variables because of the limit):


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str35 ind_id float(weight gender age education ups) double time_spent1 byte act_1d
      "TUS10001106201913310301382332001001" 2296 1 67 5 11 480 1
      "TUS10001106201913310301382332001001" 2296 1 67 5 11  45 4
      "TUS10001106201913310301382332001001" 2296 1 67 5 11 165 8
      "TUS10001106201913310301382332001001" 2296 1 67 5 11 750 9
      "TUS10001106201913310301382332001002" 2296 2 61 2 51 300 1
      "TUS10001106201913310301382332001002" 2296 2 61 2 51 165 3
      "TUS10001106201913310301382332001002" 2296 2 61 2 51  60 4
      "TUS10001106201913310301382332001002" 2296 2 61 2 51 195 8
      "TUS10001106201913310301382332001002" 2296 2 61 2 51 720 9
      "TUS10001106201913310301382332001003" 2296 1 33 7 31 420 1
      "TUS10001106201913310301382332001003" 2296 1 33 7 31  15 4
      "TUS10001106201913310301382332001003" 2296 1 33 7 31 135 7
      "TUS10001106201913310301382332001003" 2296 1 33 7 31 105 8
      "TUS10001106201913310301382332001003" 2296 1 33 7 31 765 9
      "TUS10001106201913310301382332001004" 2296 2 26 5 92 180 2
      "TUS10001106201913310301382332001004" 2296 2 26 5 92 345 3
      "TUS10001106201913310301382332001004" 2296 2 26 5 92 120 4
      "TUS10001106201913310301382332001004" 2296 2 26 5 92  15 7
      "TUS10001106201913310301382332001004" 2296 2 26 5 92  45 8
      "TUS10001106201913310301382332001004" 2296 2 26 5 92 735 9
      end
      Also, I require most of the variables specified in the reshape command, so some solution will be really helpful.
      Last edited by Varsha Vaishnav; 22 Jan 2024, 10:59.

      Comment


      • #4
        It's not clear what you want to do. In your example data, the variables weight, gender, age, education, and ups are all constant within ind_id, so these variables do not need to be mentioned in the -reshape- command if you use -i(ind_id)-. So I think what you want is:
        Code:
        reshape wide time_spent1, i(ind_id) j(act_1d)
        That said, I fully endorse what Nick said: the data set as you show it is already organized properly for the vast majority of data management and analysis in Stata. Unless you can specifically identify what you plan to do next that requires transforming it to wide layout, you will regret making the change as easy things will become difficult or impossible.

        Comment


        • #5
          Thanks, Clyde, for pointing it out. It helps. Also, the purpose of reshaping is to get time spent on different activities in separate columns.

          Comment


          • #6

            Code:
            egen wanted = total(time_spent1), by(ind_id act_id)  
            and

            Code:
            collapse (sum) time_spent1, by(ind_id act_id)  
             
            are simpler alternatives.

            Always, the rationale for a reshape depends on what you will do with the new shape. You have many variables already and making many more often confuses as much as it clarifies.

            Comment


            • #7
              Thank you, Nick, for the alternatives.

              Comment

              Working...
              X