How to shuffle or randomly assign the values within each group?

Serendie Wei

Join Date: Nov 2020
Posts: 7

How to shuffle or randomly assign the values within each group?

08 Nov 2020, 04:54

Hi everyone!
This is my first time to post a question here.

I am working on a historical population dataset. This dataset is a triennial population register records. The dataset only contains records on year 1792, 1795, 1798,1801,1804,1807,1810,1813…1837.

I used tsfill function to change this dataset, making every person has a full record in each year. Therefore, 1793,1794,1796,1797 and so on are years generated by tsfill function.

My goal is to calculate age-specific marital fertility rate. Now I only have a variable for predicting whether this woman will have a child in next three years.

e.g. If No.1 first-married woman has a nextboys that equals to 1 in year 1792, she will give girth to a boy in 1792, 1793, 1794. In this case, 1793 and 1794 are two years generated by the program. I fill the blank with 0.

The following is an example of a subset of the dataset.

personid	year	age	married	nextboys	groupid
1	1792	34	1	1	1
1	1793	35	1	0	1
1	1794	36	1	0	1
2	1792	23	1	1	2
2	1793	24	1	0	2
2	1794	25	1	0	2
3	1801	18	1	1	3
3	1802	19	1	0	3
3	1803	20	1	0	3
3	1804	21	1	1	4
3	1805	22	1	0	4
3	1806	23	1	0	4
4	1825	26	1	1	5
4	1826	27	1	0	5
4	1827	28	1	0	5

As we can see, we don't know in which exact year a woman gives birth to a new boy in the next three-year period. Take group 4 as an example: No. 3 woman has two records in this dataset in year 1801 and 1804. I used tsfill to fill blank years. No.3 woman has a nextboys at 1 in 1804. She will has this boy in 1804, 1805 or 1806, but we don't know which year.

I want this table look like this through reshuffling the dataset:

personid	year	age	married	nextboys	groupid
1	1792	34	1	1	1
1	1793	35	1	0	1
1	1794	36	1	0	1
2	1792	23	1	0	2
2	1793	24	1	0	2
2	1794	25	1	1	2
3	1801	18	1	0	3
3	1802	19	1	1	3
3	1803	20	1	0	3
3	1804	21	1	0	4
3	1805	22	1	1	4
3	1806	23	1	0	4
4	1825	26	1	0	5
4	1826	27	1	0	5
4	1827	28	1	1	5

The value of nextboys, which will be randomly assigned within each group. Through this I can have a dataset simulating the real situation. A woman has a record of giving birth a boy in the exact year.

I would be much grateful for any suggestion and answer on this question! Thank you very much!

Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

08 Nov 2020, 06:55

Serendie:
welcome to this forum.
Do you mean something along the following lines?

Code:

. input personid year age married nextboys groupid

      personid       year        age    married   nextboys    groupid
  1. 1 1792 34 1 1 1
  2.
. 1 1793 35 1 0 1
  3.
. 1 1794 36 1 0 1
  4.
. 2 1792 23 1 1 2
  5.
. 2 1793 24 1 0 2
  6.
. 2 1794 25 1 0 2
  7.
. 3 1801 18 1 1 3
  8.
. 3 1802 19 1 0 3
  9.
. 3 1803 20 1 0 3
 10.
. 3 1804 21 1 1 4
 11.
. 3 1805 22 1 0 4
 12.
. 3 1806 23 1 0 4
 13.
. 4 1825 26 1 1 5
 14.
. 4 1826 27 1 0 5
 15.
. 4 1827 28 1 0 5
 16.
. end

 bysort groupid : gen counter=runiform()
 
. bysort groupid: egen nextboys_2=max(counter)

. replace nextboys_2=1 if counter==nextboys_2

. replace nextboys_2=0 if nextboys_2!=1

. sort personid year

. list

     +----------------------------------------------------------------------------+
     | personid   year   age   married   nextboys   groupid    counter   nextbo~2 |
     |----------------------------------------------------------------------------|
  1. |        1   1792    34         1          1         1   .0324792          0 |
  2. |        1   1793    35         1          0         1   .9874847          1 |
  3. |        1   1794    36         1          0         1    .894106          0 |
  4. |        2   1792    23         1          1         2   .9684734          1 |
  5. |        2   1793    24         1          0         2   .2392203          0 |
     |----------------------------------------------------------------------------|
  6. |        2   1794    25         1          0         2   .6927336          0 |
  7. |        3   1801    18         1          1         3   .4884359          0 |
  8. |        3   1802    19         1          0         3   .4376452          0 |
  9. |        3   1803    20         1          0         3   .5858005          1 |
 10. |        3   1804    21         1          1         4   .3787092          0 |
     |----------------------------------------------------------------------------|
 11. |        3   1805    22         1          0         4   .6880603          0 |
 12. |        3   1806    23         1          0         4   .9794578          1 |
 13. |        4   1825    26         1          1         5   .6701937          0 |
 14. |        4   1826    27         1          0         5   .5948808          0 |
 15. |        4   1827    28         1          0         5   .7970893          1 |
     +----------------------------------------------------------------------------+

.

Caveat emptor: the code does not consider missing values.

As an aside, please use CODE delimters (as per FAQ) to share excerpts/examples of your dataset. Thanks.

Kind regards,
Carlo
(Stata 19.0)

Comment

Serendie Wei

Join Date: Nov 2020

Posts: 7
#3

08 Nov 2020, 07:11

Thank you Carlo! Brilliant! It is very very useful! I get the result I want.

Could you please explain

bysort groupid: egen nextboys_2=max(counter)
replace nextboys_2=1 if counter==nextboys_2
replace nextboys_2=0 if nextboys_2!=1

how do these three sentences work? Thank you very much!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#4

08 Nov 2020, 07:21

Serendie:
the first line of code tells Stata to select the maximum value of -counter- for each -groupid-;
the second line of code tells Stata to -replace- the existing value with 1 in -nextboys_2- whenever the value of -counter- and -nextboys_2- are the same;
the third line of code tells Stata to -replace- the existing value with 0 in -nextboys_2- whenever the value of -nextboys_2- differs from 1.

Kind regards,
Carlo
(Stata 19.0)
Comment
Serendie Wei

Join Date: Nov 2020

Posts: 7
#5

08 Nov 2020, 07:31

Dear Carlo,

Thank you for the explanation!

Best,
Serendie
Comment

Announcement

How to shuffle or randomly assign the values within each group?

Comment

Comment

Comment

Comment