Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with code to create a new variable

    Hi guys. Any help will be appreciated. I don't know any program language and I need something that I believe I cannot resolve otherwise
    The pertinent variables look like this:

    modedel id_sn
    2 61
    1 61
    3 61
    3 61
    1 61
    2 68
    3 68
    3 68
    4 68
    1 68
    1 59
    1 59
    1 59
    4 59
    2 61
    2 61
    3 61
    4 61
    2 61
    ....
    I need to assign a serial running number to a new variable (new_var) that will run from 1,2,3....it should change every time that the value in "id_sn" changes.
    The problem is that "id_sn" can repeat itself many times in the database so I figure that the dataset must remain sorted. I need it later to calculate the proportion of certain value (e.g. "2") in "modedel" for each "new_var" (probably by using 'collapse')

    The output should be:

    modedel id_sn new_var
    2 61 1
    1 61 1
    3 61 1
    3 61 1
    1 61 1
    2 68 2
    3 68 2
    3 68 2
    4 68 2
    1 68 2
    1 59 3
    1 59 3
    1 59 3
    4 59 3
    2 61 4
    2 61 4
    3 61 4
    4 61 4
    2 61 4

    Thank you!

  • #2
    sort id_sn
    egen new_var=group(id_sn)

    Comment


    • #3
      #2 from Rasool Baloch isn't what was asked for. For example, the two spells of 61 will get mapped to the same new value, which wasn't wanted.

      David Knigin Thanks for the data example. Please use dataex, as here.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(modedel id_sn)
      2 61
      1 61
      3 61
      3 61
      1 61
      2 68
      3 68
      3 68
      4 68
      1 68
      1 59
      1 59
      1 59
      4 59
      2 61
      2 61
      3 61
      4 61
      2 61
      end
      
      . gen wanted = sum(id_sn != id_sn[_n-1])
      
      
      . l, sepby(wanted)
      
           +--------------------------+
           | modedel   id_sn   wanted |
           |--------------------------|
        1. |       2      61        1 |
        2. |       1      61        1 |
        3. |       3      61        1 |
        4. |       3      61        1 |
        5. |       1      61        1 |
           |--------------------------|
        6. |       2      68        2 |
        7. |       3      68        2 |
        8. |       3      68        2 |
        9. |       4      68        2 |
       10. |       1      68        2 |
           |--------------------------|
       11. |       1      59        3 |
       12. |       1      59        3 |
       13. |       1      59        3 |
       14. |       4      59        3 |
           |--------------------------|
       15. |       2      61        4 |
       16. |       2      61        4 |
       17. |       3      61        4 |
       18. |       4      61        4 |
       19. |       2      61        4 |
           +--------------------------+

      Comment


      • #4
        Thanks a million! Nick Cox . It is what I need. So elegant!

        Comment


        • #5
          Thanks for the thanks. If interested, you could skim the paper at https://www.stata-journal.com/articl...article=dm0029 which is an extension of that simple theme.

          Comment


          • #6
            Another question, following the successful creation of "wanted". I need to calculate the percentage of one of the values, for example "4" for every "wanted" group. I am trying to use the collapse function but I don't know how to calculate X (value of interest) / _N (number of observation per "wanted" group.
            Thank you

            Comment


            • #7
              What I ended doing is

              Create indicator variable for the spell length:

              .by wanted, sort: gen wantedlength=_N (according to the article by Nick Cox

              Then, I generated a variable that detects conditional observation of modedel:

              .by wanted, sort: gen density=sum(modedel==4)/wantedlength

              Finally, I created a variable of maximum of 'density':

              .by wanted, sort: egen=max(density)

              I think it worked but if there is nicer code for this I would like to learn from you guys

              Thanks

              Comment


              • #8
                Code:
                bysort wanted : egen mean = mean(modedel == 4)
                is the fraction of observations with value 4 in each block.

                For a percent, mean(100 * (modedel == 4))

                Comment

                Working...
                X