Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create duplicate observations with specific order for each group

    Hi Statalists,

    I apologize if my question seems basic, but I have tried several approaches without success. I am trying to repeat a set of observations in a specific order and group them accordingly. Below is a simplified version of the data I am working with:

    Code:
    *ssc install dataex
    input str10 Type str10 Subtype int Year int id
    "A" "ABC" 2016 1
    "A" "ABC" 2017 2
    "B" "LOL" 2019 3
    "B" "MOM" 2020 4
    "C" "EDI" 2013 5
    "C" "KII" 2015 6
    Ideally, I would like to repeat this group of observations in the same order 3000 times and assign a label to each group. The desired output should look like this:
    Type Subtype Year id Group
    A ABC 2016 1 1
    A ABC 2017 2 1
    B LOL 2019 3 1
    B MOM 2020 4 1
    C EDI 2013 5 1
    C KII 2015 6 1
    A ABC 2016 1 2
    A ABC 2017 2 2
    B LOL 2019 3 2
    B MOM 2020 4 2
    C EDI 2013 5 2
    C KII 2015 6 2
    ... ... ... ... ...
    A ABC 2016 1 3000
    A ABC 2017 2 3000
    B LOL 2019 3 3000
    B MOM 2020 4 3000
    C EDI 2013 5 3000
    C KII 2015 6 3000
    I used the expand function to repeat the observations and created a grouping variable like this:

    Code:
    input str10 Type str10 Subtype int Year int id
    "A" "ABC" 2016 1
    "A" "ABC" 2017 2
    "B" "LOL" 2019 3
    "B" "MOM" 2020 4
    "C" "EDI" 2013 5
    "C" "KII" 2015 6
    
    gen repeat = 5
    expand repeat
    
    gen group = ceil(_n /6)
    However, I am encountering an issue where the data is not ordered as I want it to be, and each group ends up containing the same observations instead of following the specific order.

    I would really appreciate if anyone in the community has any advice or suggestions on how to achieve this.

    Thanks a ton!
    K

  • #2
    Thanks for the data example. It needs some surgery to run, but I think this is the essence of what you need, modulo a sort.

    Code:
    clear 
    
    input str10 Type str10 Subtype int Year int id
    "A" "ABC" 2016 1
    "A" "ABC" 2017 2
    "B" "LOL" 2019 3
    "B" "MOM" 2020 4
    "C" "EDI" 2013 5
    "C" "KII" 2015 6
    end 
    
    expand 5
    
    bysort Type Subtype Year id : gen group = _n 
    
    list
    
         +------------------------------------+
         | Type   Subtype   Year   id   group |
         |------------------------------------|
      1. |    A       ABC   2016    1       1 |
      2. |    A       ABC   2016    1       2 |
      3. |    A       ABC   2016    1       3 |
      4. |    A       ABC   2016    1       4 |
      5. |    A       ABC   2016    1       5 |
         |------------------------------------|
      6. |    A       ABC   2017    2       1 |
      7. |    A       ABC   2017    2       2 |
      8. |    A       ABC   2017    2       3 |
      9. |    A       ABC   2017    2       4 |
     10. |    A       ABC   2017    2       5 |
         |------------------------------------|
     11. |    B       LOL   2019    3       1 |
     12. |    B       LOL   2019    3       2 |
     13. |    B       LOL   2019    3       3 |
     14. |    B       LOL   2019    3       4 |
     15. |    B       LOL   2019    3       5 |
         |------------------------------------|
     16. |    B       MOM   2020    4       1 |
     17. |    B       MOM   2020    4       2 |
     18. |    B       MOM   2020    4       3 |
     19. |    B       MOM   2020    4       4 |
     20. |    B       MOM   2020    4       5 |
         |------------------------------------|
     21. |    C       EDI   2013    5       1 |
     22. |    C       EDI   2013    5       2 |
     23. |    C       EDI   2013    5       3 |
     24. |    C       EDI   2013    5       4 |
     25. |    C       EDI   2013    5       5 |
         |------------------------------------|
     26. |    C       KII   2015    6       1 |
     27. |    C       KII   2015    6       2 |
     28. |    C       KII   2015    6       3 |
     29. |    C       KII   2015    6       4 |
     30. |    C       KII   2015    6       5 |
         +------------------------------------+
    Detail: expand is a command, not a function. Whatever may be true elsewhere, in Stata commands and functions are disjoint, so function is not another name for command.

    Comment


    • #3
      Nick Cox Thank you for your detailed explanation and for clarifying the distinction between a command and a function. I have a quick follow-up question: I tried to sort the data back to its original order (using sort Type Subtype Year Group), but it doesn’t seem to have worked as expected.

      Do you have any additional suggestions on how I might resolve this issue?

      I would appreciate any further comments or guidance.

      Thank you again!
      K

      Comment


      • #4
        I think you should start your variable list with Group to get what you want. Further variables may need to be added to the arguments.

        Code:
        sort Group Type Subtype Year

        Code:
        help sort
        explains

        Comment

        Working...
        X