Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace value of a variable with the value that appears max no of times?

    Hi Statalist,

    Apologies for the vagueness of the question. Hopefully the details would make it clearer. Please consider the following example data

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id round) byte female
    1 1 1
    1 2 0
    1 3 1
    1 4 1
    1 5 1
    2 1 0
    2 2 0
    2 4 0
    2 5 0
    3 1 1
    3 2 0
    3 3 0
    3 5 .
    4 1 .
    4 2 .
    4 5 .
    end


    female is a binary variable denoting sex of individual. Now as you can see, there is some discrepancy in its value across rounds. This discrepancy exists in the raw data, which can be attributed to recording errors at the time of enumeration. To correct this, I'm operating on the premise that the enumerator is unlikely to make identical mistakes in multiple rounds. So, for a given individual, the value of female that appears the least number of times must be the erroneous value and needs to be corrected. This is why I want to replace the value of female (or generate a variable say female1), which for each individual, takes the value appearing maximum number of times, across all available rounds. So, I would like my data to look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id round) byte female float female1
    1 1 1 1
    1 2 0 1
    1 3 1 1
    1 4 1 1
    1 5 1 1
    2 1 0 0
    2 2 0 0
    2 4 0 0
    2 5 0 0
    3 1 1 0
    3 2 0 0
    3 3 0 0
    3 5 . 0
    4 1 . .
    4 2 . .
    4 5 . .
    end
    If there are some missing values, the non missing values are to be considered. If values are missing for all rounds, then resultant value would be missing.

    Would appreciate any help on this!
    Thanks

  • #2
    Code:
    bys id: egen mean = mean(female)
    bys id: gen wanted = mean >= .5 if mean < .

    Comment


    • #3
      What if you have ties? My code favors female=0 in the case of a tie with either 1 or missing and female=1 in the case of a tie with missing.

      Code:
      clear
      input float(id round) byte female
      1 1 1
      1 2 0
      1 3 1
      1 4 1
      1 5 1
      2 1 0
      2 2 0
      2 4 0
      2 5 0
      3 1 1
      3 2 0
      3 3 0
      3 5 .
      4 1 .
      4 2 .
      4 5 .
      end
      
      bys id female: gen wanted=_N
      gsort id wanted -female
      by id: replace wanted= female[_N]
      Res.:

      Code:
      . sort id round
      
      . l, sepby(id)
      
           +------------------------------+
           | id   round   female   wanted |
           |------------------------------|
        1. |  1       1        1        1 |
        2. |  1       2        0        1 |
        3. |  1       3        1        1 |
        4. |  1       4        1        1 |
        5. |  1       5        1        1 |
           |------------------------------|
        6. |  2       1        0        0 |
        7. |  2       2        0        0 |
        8. |  2       4        0        0 |
        9. |  2       5        0        0 |
           |------------------------------|
       10. |  3       1        1        0 |
       11. |  3       2        0        0 |
       12. |  3       3        0        0 |
       13. |  3       5        .        0 |
           |------------------------------|
       14. |  4       1        .        . |
       15. |  4       2        .        . |
       16. |  4       5        .        . |
           +------------------------------+
      
      .
      Last edited by Andrew Musau; 16 Jan 2022, 02:38.

      Comment


      • #4
        For (0, 1) values a direct attack on the problem is a good idea. Let's flag also that the procedure has a name. You're looking for the mode, and more generally egen has handles for that.

        Comment


        • #5
          Thanks Oyvind,Andrew, and Nick. Your suggestions were extremely helpful!

          Comment

          Working...
          X