Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to shuffle or randomly assign the values within each group?

    Hi everyone!
    This is my first time to post a question here.

    I am working on a historical population dataset. This dataset is a triennial population register records. The dataset only contains records on year 1792, 1795, 1798,1801,1804,1807,1810,1813…1837.

    I used tsfill function to change this dataset, making every person has a full record in each year. Therefore, 1793,1794,1796,1797 and so on are years generated by tsfill function.

    My goal is to calculate age-specific marital fertility rate. Now I only have a variable for predicting whether this woman will have a child in next three years.

    e.g. If No.1 first-married woman has a nextboys that equals to 1 in year 1792, she will give girth to a boy in 1792, 1793, 1794. In this case, 1793 and 1794 are two years generated by the program. I fill the blank with 0.

    The following is an example of a subset of the dataset.
    personid year age married nextboys groupid
    1 1792 34 1 1 1
    1 1793 35 1 0 1
    1 1794 36 1 0 1
    2 1792 23 1 1 2
    2 1793 24 1 0 2
    2 1794 25 1 0 2
    3 1801 18 1 1 3
    3 1802 19 1 0 3
    3 1803 20 1 0 3
    3 1804 21 1 1 4
    3 1805 22 1 0 4
    3 1806 23 1 0 4
    4 1825 26 1 1 5
    4 1826 27 1 0 5
    4 1827 28 1 0 5
    As we can see, we don't know in which exact year a woman gives birth to a new boy in the next three-year period. Take group 4 as an example: No. 3 woman has two records in this dataset in year 1801 and 1804. I used tsfill to fill blank years. No.3 woman has a nextboys at 1 in 1804. She will has this boy in 1804, 1805 or 1806, but we don't know which year.

    I want this table look like this through reshuffling the dataset:
    personid year age married nextboys groupid
    1 1792 34 1 1 1
    1 1793 35 1 0 1
    1 1794 36 1 0 1
    2 1792 23 1 0 2
    2 1793 24 1 0 2
    2 1794 25 1 1 2
    3 1801 18 1 0 3
    3 1802 19 1 1 3
    3 1803 20 1 0 3
    3 1804 21 1 0 4
    3 1805 22 1 1 4
    3 1806 23 1 0 4
    4 1825 26 1 0 5
    4 1826 27 1 0 5
    4 1827 28 1 1 5

    The value of nextboys, which will be randomly assigned within each group. Through this I can have a dataset simulating the real situation. A woman has a record of giving birth a boy in the exact year.

    I would be much grateful for any suggestion and answer on this question! Thank you very much!













  • #2
    Serendie:
    welcome to this forum.
    Do you mean something along the following lines?
    Code:
    . input personid year age married nextboys groupid
    
          personid       year        age    married   nextboys    groupid
      1. 1 1792 34 1 1 1
      2.
    . 1 1793 35 1 0 1
      3.
    . 1 1794 36 1 0 1
      4.
    . 2 1792 23 1 1 2
      5.
    . 2 1793 24 1 0 2
      6.
    . 2 1794 25 1 0 2
      7.
    . 3 1801 18 1 1 3
      8.
    . 3 1802 19 1 0 3
      9.
    . 3 1803 20 1 0 3
     10.
    . 3 1804 21 1 1 4
     11.
    . 3 1805 22 1 0 4
     12.
    . 3 1806 23 1 0 4
     13.
    . 4 1825 26 1 1 5
     14.
    . 4 1826 27 1 0 5
     15.
    . 4 1827 28 1 0 5
     16.
    . end
    
     bysort groupid : gen counter=runiform()
     
    . bysort groupid: egen nextboys_2=max(counter)
    
    . replace nextboys_2=1 if counter==nextboys_2
    
    . replace nextboys_2=0 if nextboys_2!=1
    
    . sort personid year
    
    . list
    
         +----------------------------------------------------------------------------+
         | personid   year   age   married   nextboys   groupid    counter   nextbo~2 |
         |----------------------------------------------------------------------------|
      1. |        1   1792    34         1          1         1   .0324792          0 |
      2. |        1   1793    35         1          0         1   .9874847          1 |
      3. |        1   1794    36         1          0         1    .894106          0 |
      4. |        2   1792    23         1          1         2   .9684734          1 |
      5. |        2   1793    24         1          0         2   .2392203          0 |
         |----------------------------------------------------------------------------|
      6. |        2   1794    25         1          0         2   .6927336          0 |
      7. |        3   1801    18         1          1         3   .4884359          0 |
      8. |        3   1802    19         1          0         3   .4376452          0 |
      9. |        3   1803    20         1          0         3   .5858005          1 |
     10. |        3   1804    21         1          1         4   .3787092          0 |
         |----------------------------------------------------------------------------|
     11. |        3   1805    22         1          0         4   .6880603          0 |
     12. |        3   1806    23         1          0         4   .9794578          1 |
     13. |        4   1825    26         1          1         5   .6701937          0 |
     14. |        4   1826    27         1          0         5   .5948808          0 |
     15. |        4   1827    28         1          0         5   .7970893          1 |
         +----------------------------------------------------------------------------+
    
    .
    Caveat emptor: the code does not consider missing values.

    As an aside, please use CODE delimters (as per FAQ) to share excerpts/examples of your dataset. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo! Brilliant! It is very very useful! I get the result I want.

      Could you please explain


      bysort groupid: egen nextboys_2=max(counter)
      replace nextboys_2=1 if counter==nextboys_2
      replace nextboys_2=0 if nextboys_2!=1

      how do these three sentences work? Thank you very much!

      Comment


      • #4
        Serendie:
        the first line of code tells Stata to select the maximum value of -counter- for each -groupid-;
        the second line of code tells Stata to -replace- the existing value with 1 in -nextboys_2- whenever the value of -counter- and -nextboys_2- are the same;
        the third line of code tells Stata to -replace- the existing value with 0 in -nextboys_2- whenever the value of -nextboys_2- differs from 1.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,

          Thank you for the explanation!

          Best,
          Serendie

          Comment

          Working...
          X