Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data in wide format

    Hello everyone.

    I need your help with a topic I am working on. I currently have a database that contains information about higher education in Ecuador. Specifically, this data contains the graduates in higher education. Some variables indicate the identifier of each person, the university where he/she studied, the career he/she studied and the graduation dates, among other variables about the person's characteristics or the university. I need to convert this data to a horizontal format for data processing purposes, where each observation is a single person. However, the original database has more than one observation for each person because there are cases of individuals with more than one university degree recorded. There are also cases where their postgraduate or doctorate degrees are recorded and are reflected as one more observation. My question is the following:
    Is there any way to convert to a horizontal format all this data without losing information on each individual?

    I have tried the "reshape wide" command, however it is only useful when I have a single observation per individual. Any suggestions would be really helpful and I would appreciate it. I am attaching an example of the database, as it is confidential information I am not allowed to sample the data. Regards

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id_number num_observations) str8 career_name str12 university_name float(graduation_date degree_note) str7 level_education
    1 5 "Career_2" "University_0" 744 3 "level_1"
    1 5 "Career_9" "University_6" 751 3 "level_2"
    1 5 "Career_9" "University_1" 753 1 "level_1"
    1 5 "Career_8" "University_7" 728 3 "level_2"
    1 5 "Career_8" "University_5" 750 2 "level_2"
    2 5 "Career_7" "University_5" 755 1 "level_1"
    2 5 "Career_8" "University_9" 726 3 "level_2"
    2 5 "Career_6" "University_9" 730 1 "level_3"
    2 5 "Career_0" "University_7" 746 4 "level_2"
    2 5 "Career_0" "University_3" 751 5 "level_1"
    3 5 "Career_2" "University_0" 753 4 "level_2"
    3 5 "Career_7" "University_2" 739 3 "level_1"
    3 5 "Career_0" "University_0" 724 1 "level_2"
    3 5 "Career_2" "University_8" 727 4 "level_2"
    3 5 "Career_1" "University_2" 734 0 "level_2"
    4 3 "Career_9" "University_8" 730 1 "level_3"
    4 3 "Career_5" "University_4" 735 4 "level_2"
    4 3 "Career_6" "University_1" 732 3 "level_1"
    5 3 "Career_0" "University_7" 753 5 "level_1"
    5 3 "Career_9" "University_9" 730 0 "level_2"
    end

    Last edited by Steven Vaccato; 26 Mar 2024, 20:27.

  • #2
    I have tried the "reshape wide" command, however it is only useful when I have a single observation per individual.
    On the contrary, -reshape wide- is exactly what you need unless I severely misunderstand what you are trying to do. I think you just haven't used it correctly.

    Try this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id_number num_observations) str8 career_name str12 university_name float(graduation_date degree_note) str7 level_education
    1 5 "Career_2" "University_0" 744 3 "level_1"
    1 5 "Career_9" "University_6" 751 3 "level_2"
    1 5 "Career_9" "University_1" 753 1 "level_1"
    1 5 "Career_8" "University_7" 728 3 "level_2"
    1 5 "Career_8" "University_5" 750 2 "level_2"
    2 5 "Career_7" "University_5" 755 1 "level_1"
    2 5 "Career_8" "University_9" 726 3 "level_2"
    2 5 "Career_6" "University_9" 730 1 "level_3"
    2 5 "Career_0" "University_7" 746 4 "level_2"
    2 5 "Career_0" "University_3" 751 5 "level_1"
    3 5 "Career_2" "University_0" 753 4 "level_2"
    3 5 "Career_7" "University_2" 739 3 "level_1"
    3 5 "Career_0" "University_0" 724 1 "level_2"
    3 5 "Career_2" "University_8" 727 4 "level_2"
    3 5 "Career_1" "University_2" 734 0 "level_2"
    4 3 "Career_9" "University_8" 730 1 "level_3"
    4 3 "Career_5" "University_4" 735 4 "level_2"
    4 3 "Career_6" "University_1" 732 3 "level_1"
    5 3 "Career_0" "University_7" 753 5 "level_1"
    5 3 "Career_9" "University_9" 730 0 "level_2"
    end
    
    sort id_number, stable
    by id_number: replace num_observations = _n
    
    ds id_number num_observations, not
    local vbles `r(varlist)'
    reshape wide `vbles', i(id_number) j(num_observations)

    Comment

    Working...
    X