Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable with a distinct value from two other variables with same values

    Hello everyone.

    I have a dataset where the two(or more) variables have the same values. I want to create a variable that would just indicate one distinct values. I tried to find an easy way but couldn't except doing it manually in excel. Could you please give me an advice how to do it? The sample of data looks like below.

    To explain, I want to create the variable "Major_1997" that would just have one distinct value from "Major_1997_1" and "Major_1997_2" since they both have same value. I hope what I explained is clear. Thank you very much in advance for your help!
    ID Major_1997_1 Major_1997_2 Major_1997
    1 3 3 3

  • #2
    Hello JungHwan Kim. I see this is your first post. Welcome.

    Does this code produce the result you want?

    Code:
    generate byte Major_1997 = .
    replace Major_1997 = Major_1997_1 if (Major_1997_1==Major_1997_2)
    PS- Please see the advice in the FAQ about providing a small data set (via dataex) to illustrate the problem, and about using code delimiters to show show code and output in a more readable fashion.
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Dear Bruce,

      Thank you very much for your reply. The code you provide works in the dataset example I mentioned! I have one more question though for some other cases in this dataset. For example, there is a case where some variables have missing value (i.e. dot) while the other variables have a value. In this case, I wanted to code such that it will replace the value by automatically choosing the variable that has the value instead of missing variable. Example would be the following:
      ID Major_1997_1 Major_1997_2 Major_1997_3 Major_1997_4 Major_1997
      1 3 . 3 . 3
      In this case, would there be a way to generate 'Major_1997' and replace its value to unique value '3' without having to specify specific variable (i.e. Major_1997_3)? In other words, is there a way where Stata can choose the distinct value without specifying a specific variable?


      Thank you very much for your help!


      Best regards,

      Jung Hwan Kim

      Comment


      • #4
        The first thing that comes to mind for me is using egen with the rowsd() function to flag rows where all values are the same--they will have a row SD = 0.

        Code:
        * Create a small dataset to illustrate
        clear
        input byte(ID Major_1997_1 Major_1997_2 Major_1997_3 Major_1997_4)
        1     3     .  3  .
        2   2   3  3  3
        3   3   2  .  3
        4   2   2  .  2
        5   .   .  .  .
        end
        
        * The following code assumes variables
        * Major_1997_1 to Major_1997_4 are contiguous in
        * the data file.  If they are not, list all 4 variables
        * inside the parentheses.
        
        egen double sd1997 = rowsd(Major_1997_1 - Major_1997_4)
        egen Major_1997 = rowmin(Major_1997_1 - Major_1997_4) if sd1997==0
        list, clean noobs
        drop sd1997 // Assuming it is no longer needed
        Here is the output from the -list- command:
        Code:
        . list, clean noobs
        
            ID   Major_~1   Major_~2   Major_~3   Major_~4      sd1997   Maj~1997  
             1          3          .          3          .           0          3  
             2          2          3          3          3          .5          .  
             3          3          2          .          3   .57735027          .  
             4          2          2          .          2           0          2  
             5          .          .          .          .           .          .

        I hope this helps.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          I think I have a different interpretation than Bruce regarding what JungHwan wants. Perhaps JungHwan wants to replace missing values with the nearest previous good (non-missing) value:
          Code:
          gen int previous_good = .
          foreach m of varlist Major_1997_1 Major_1997_2 Major_1997_3 Major_1997_4 Major_1997 {
              replace `m' = previous_good if missing(`m')
              replace previous_good = `m'   
           }

          Comment


          • #6
            Thank you very much for your help! I got an idea from yourposts and did the following, and it seems to work! :D

            Code:
            gen Major_1997 = .
            foreach x of varlist Major_1997_*{
            replace Major_1997= `x' if `x'>=0 & `x' !=.
            }

            Comment


            • #7
              Hello JungHwan Kim. Using the small dataset I created in #4, my method and yours give different results for a couple of cases. Just want to make sure you've spotted that, and that you are getting the result you want.

              The following has the dataset from #4, but with one additional observation added (the one in red), which I think illustrates a problem with your method.

              Code:
              * Create a small dataset to illustrate
              clear
              input byte(ID Major_1997_1 Major_1997_2 Major_1997_3 Major_1997_4)
              1     3     .  3  .
              2   2   3  3  3
              3   3   2  .  3
              4   2   2  .  2
              5   .   .  .  .
              6   3   3  3  2
              end
              
              * Bruce Weaver's method in #4
              egen double sd1997 = rowsd(Major_1997_1 - Major_1997_4)
              egen bw_1997 = rowmin(Major_1997_1 - Major_1997_4) if sd1997==0
              drop sd1997 // Assuming it is no longer needed
              
              * JungHwan Kim's method in #6
              gen jk_1997 = .
              foreach x of varlist Major_1997_*{
              replace jk_1997= `x' if `x'>=0 & `x' !=.
              }
              list, clean noobs
              Output from -list- command:

              Code:
              . list, clean noobs
              
                  ID   Major_~1   Major_~2   Major_~3   Major_~4   bw_1997   jk_1997  
                   1          3          .          3          .         3         3  
                   2          2          3          3          3         .         3   <-- Methods disagree
                   3          3          2          .          3         .         3   <-- Methods disagree
                   4          2          2          .          2         2         2  
                   5          .          .          .          .         .         .  
                   6          3          3          3          2         .         2   <-- Methods disagree
              My method gives system missing as the result on the three cases flagged above because there are 2 or more distinct values across the 4 variables. I thought you wanted to fill in a value only when all values are the same. And I doubt you want a value of 2 on that last observation. But perhaps I misunderstood what you want.

              Cheers,
              Bruce
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Hello Bruce Weaver. Thank you very much for your reply. I think I might have written my question in a way that is confusing.
                I wanted to create a variable that would indicate individual's major. There are some case where values change along the time horizen because individuals could have changed major. I wanted to produce the variable such that it indicates the final major that individuals had during the education.


                Best,

                Jung Hwan Kim

                Comment


                • #9
                  Ah, okay. Thank you for clarifying.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment

                  Working...
                  X