Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with formatting data

    Hello, I have a previous data set in the long format as shown below




    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long student_id str6 variable byte value
     1 "age"    33
     1 "county"  2
     1 "male"    0
     1 "race"    1
     2 "age"    37
     2 "county"  2
     2 "male"    1
     2 "race"    1
     3 "age"    36
     3 "county"  2
     3 "male"    0
     3 "race"    1
     4 "age"    37
     4 "county"  2
     4 "male"    0
     4 "race"    1
     5 "age"    25
     5 "county"  2
     5 "male"    1
     5 "race"    1
     6 "age"    24
     6 "county"  2
     6 "male"    0
     6 "race"    1
     7 "age"    19
     7 "county"  2
     7 "male"    1
     7 "race"    1
     8 "age"    27
     8 "county"  2
     8 "male"    1
     8 "race"    1
     9 "age"    31
     9 "county"  2
     9 "male"    0
     9 "race"    1
    10 "age"    30
    10 "county"  2
    10 "male"    0
    10 "race"    1
    11 "age"    34
    11 "county"  2
    11 "male"    1
    11 "race"    1
    12 "age"    36
    12 "county"  2
    12 "male"    1
    12 "race"    1
    13 "age"    31
    13 "county"  2
    13 "male"    1
    13 "race"    1
    14 "age"    22
    14 "county"  2
    14 "male"    0
    14 "race"    1
    15 "age"    20
    15 "county"  2
    15 "male"    1
    15 "race"    1
    16 "age"    22
    16 "county"  2
    16 "male"    0
    16 "race"    1
    17 "age"    35
    17 "county"  2
    17 "male"    0
    17 "race"    1
    18 "age"    19
    18 "county"  2
    18 "male"    1
    18 "race"    1
    19 "age"    25
    19 "county"  2
    19 "male"    0
    19 "race"    1
    20 "age"    27
    20 "county"  2
    20 "male"    0
    20 "race"    1
    21 "age"    24
    21 "county"  2
    21 "male"    0
    21 "race"    1
    22 "age"    35
    22 "county"  2
    22 "male"    1
    22 "race"    1
    23 "age"    25
    23 "county"  2
    23 "male"    0
    23 "race"    1
    24 "age"    32
    24 "county"  2
    24 "male"    0
    24 "race"    1
    25 "age"    31
    25 "county"  2
    25 "male"    1
    25 "race"    1
    end
    I then used this code to reshape the data above and got this data format



    use "county 2 data.dta"
    bysort student_id : gen ID2 = _n
    reshape wide value, i(student_id ID2) j(variable) string
    drop seq

    However, I don't want to generate cells with missing values, does anyone know how to eliminate the empty cells in the resulting data set?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long student_id float ID2 byte(valueage valuecounty valuemale valuerace)
     1 1  . 2 . .
     1 2  . . . 1
     1 3 33 . . .
     1 4  . . 0 .
     2 1 37 . . .
     2 2  . 2 . .
     2 3  . . 1 .
     2 4  . . . 1
     3 1  . 2 . .
     3 2  . . . 1
     3 3 36 . . .
     3 4  . . 0 .
     4 1  . . 0 .
     4 2 37 . . .
     4 3  . 2 . .
     4 4  . . . 1
     5 1  . . 1 .
     5 2  . 2 . .
     5 3  . . . 1
     5 4 25 . . .
     6 1  . 2 . .
     6 2 24 . . .
     6 3  . . 0 .
     6 4  . . . 1
     7 1 19 . . .
     7 2  . . 1 .
     7 3  . . . 1
     7 4  . 2 . .
     8 1  . 2 . .
     8 2  . . . 1
     8 3 27 . . .
     8 4  . . 1 .
     9 1 31 . . .
     9 2  . . . 1
     9 3  . . 0 .
     9 4  . 2 . .
    10 1  . 2 . .
    10 2  . . 0 .
    10 3 30 . . .
    10 4  . . . 1
    11 1 34 . . .
    11 2  . . 1 .
    11 3  . 2 . .
    11 4  . . . 1
    12 1  . . 1 .
    12 2  . 2 . .
    12 3 36 . . .
    12 4  . . . 1
    13 1  . . . 1
    13 2 31 . . .
    13 3  . 2 . .
    13 4  . . 1 .
    14 1 22 . . .
    14 2  . . 0 .
    14 3  . 2 . .
    14 4  . . . 1
    15 1  . . . 1
    15 2 20 . . .
    15 3  . . 1 .
    15 4  . 2 . .
    16 1  . 2 . .
    16 2  . . . 1
    16 3 22 . . .
    16 4  . . 0 .
    17 1  . 2 . .
    17 2  . . . 1
    17 3 35 . . .
    17 4  . . 0 .
    18 1 19 . . .
    18 2  . . 1 .
    18 3  . 2 . .
    18 4  . . . 1
    19 1  . . 0 .
    19 2  . . . 1
    19 3 25 . . .
    19 4  . 2 . .
    20 1  . 2 . .
    20 2  . . . 1
    20 3  . . 0 .
    20 4 27 . . .
    21 1  . 2 . .
    21 2 24 . . .
    21 3  . . 0 .
    21 4  . . . 1
    22 1 35 . . .
    22 2  . . . 1
    22 3  . . 1 .
    22 4  . 2 . .
    23 1  . . . 1
    23 2  . . 0 .
    23 3 25 . . .
    23 4  . 2 . .
    24 1 32 . . .
    24 2  . . . 1
    24 3  . 2 . .
    24 4  . . 0 .
    25 1  . . . 1
    25 2 31 . . .
    25 3  . . 1 .
    25 4  . 2 . .
    end
    Last edited by Luis Mijares Castaneda; 14 Jan 2024, 17:37.

  • #2
    I presume you want one row per student. If that is the case, this will get you there from the data in your reshaped dataframe.

    Code:
    drop ID2
    collapse (firstnm) valueage-valuerace, by(student_id)
    Last edited by Girish Venkataraman; 14 Jan 2024, 19:02. Reason: error in sentence construct

    Comment


    • #3
      I commend the term layout here (which I learned from Clyde Schechter) over format, which is overloaded in computing (file format, display format, format meaning storage type, format meaning data structure or layout).

      To the point, there is a more direct solution here that avoids the problem created by #1 and hence the need to solve the problem.

      Code:
      reshape wide value, i(student_id) j(variable) string
      rename (value*) (*)
      
      list
      
           +---------------------------------------+
           | studen~d   age   county   male   race |
           |---------------------------------------|
        1. |        1    33        2      0      1 |
        2. |        2    37        2      1      1 |
        3. |        3    36        2      0      1 |
        4. |        4    37        2      0      1 |
        5. |        5    25        2      1      1 |
           |---------------------------------------|
        6. |        6    24        2      0      1 |
        7. |        7    19        2      1      1 |
        8. |        8    27        2      1      1 |
        9. |        9    31        2      0      1 |
       10. |       10    30        2      0      1 |
           |---------------------------------------|
       11. |       11    34        2      1      1 |
       12. |       12    36        2      1      1 |
       13. |       13    31        2      1      1 |
       14. |       14    22        2      0      1 |
       15. |       15    20        2      1      1 |
           |---------------------------------------|
       16. |       16    22        2      0      1 |
       17. |       17    35        2      0      1 |
       18. |       18    19        2      1      1 |
       19. |       19    25        2      0      1 |
       20. |       20    27        2      0      1 |
           |---------------------------------------|
       21. |       21    24        2      0      1 |
       22. |       22    35        2      1      1 |
       23. |       23    25        2      0      1 |
       24. |       24    32        2      0      1 |
       25. |       25    31        2      1      1 |
           +---------------------------------------+

      Comment

      Working...
      X