Hi everyone!
This is my first time to post a question here.
I am working on a historical population dataset. This dataset is a triennial population register records. The dataset only contains records on year 1792, 1795, 1798,1801,1804,1807,1810,1813…1837.
I used tsfill function to change this dataset, making every person has a full record in each year. Therefore, 1793,1794,1796,1797 and so on are years generated by tsfill function.
My goal is to calculate age-specific marital fertility rate. Now I only have a variable for predicting whether this woman will have a child in next three years.
e.g. If No.1 first-married woman has a nextboys that equals to 1 in year 1792, she will give girth to a boy in 1792, 1793, 1794. In this case, 1793 and 1794 are two years generated by the program. I fill the blank with 0.
The following is an example of a subset of the dataset.
As we can see, we don't know in which exact year a woman gives birth to a new boy in the next three-year period. Take group 4 as an example: No. 3 woman has two records in this dataset in year 1801 and 1804. I used tsfill to fill blank years. No.3 woman has a nextboys at 1 in 1804. She will has this boy in 1804, 1805 or 1806, but we don't know which year.
I want this table look like this through reshuffling the dataset:
The value of nextboys, which will be randomly assigned within each group. Through this I can have a dataset simulating the real situation. A woman has a record of giving birth a boy in the exact year.
I would be much grateful for any suggestion and answer on this question! Thank you very much!
This is my first time to post a question here.
I am working on a historical population dataset. This dataset is a triennial population register records. The dataset only contains records on year 1792, 1795, 1798,1801,1804,1807,1810,1813…1837.
I used tsfill function to change this dataset, making every person has a full record in each year. Therefore, 1793,1794,1796,1797 and so on are years generated by tsfill function.
My goal is to calculate age-specific marital fertility rate. Now I only have a variable for predicting whether this woman will have a child in next three years.
e.g. If No.1 first-married woman has a nextboys that equals to 1 in year 1792, she will give girth to a boy in 1792, 1793, 1794. In this case, 1793 and 1794 are two years generated by the program. I fill the blank with 0.
The following is an example of a subset of the dataset.
personid | year | age | married | nextboys | groupid |
1 | 1792 | 34 | 1 | 1 | 1 |
1 | 1793 | 35 | 1 | 0 | 1 |
1 | 1794 | 36 | 1 | 0 | 1 |
2 | 1792 | 23 | 1 | 1 | 2 |
2 | 1793 | 24 | 1 | 0 | 2 |
2 | 1794 | 25 | 1 | 0 | 2 |
3 | 1801 | 18 | 1 | 1 | 3 |
3 | 1802 | 19 | 1 | 0 | 3 |
3 | 1803 | 20 | 1 | 0 | 3 |
3 | 1804 | 21 | 1 | 1 | 4 |
3 | 1805 | 22 | 1 | 0 | 4 |
3 | 1806 | 23 | 1 | 0 | 4 |
4 | 1825 | 26 | 1 | 1 | 5 |
4 | 1826 | 27 | 1 | 0 | 5 |
4 | 1827 | 28 | 1 | 0 | 5 |
I want this table look like this through reshuffling the dataset:
personid | year | age | married | nextboys | groupid |
1 | 1792 | 34 | 1 | 1 | 1 |
1 | 1793 | 35 | 1 | 0 | 1 |
1 | 1794 | 36 | 1 | 0 | 1 |
2 | 1792 | 23 | 1 | 0 | 2 |
2 | 1793 | 24 | 1 | 0 | 2 |
2 | 1794 | 25 | 1 | 1 | 2 |
3 | 1801 | 18 | 1 | 0 | 3 |
3 | 1802 | 19 | 1 | 1 | 3 |
3 | 1803 | 20 | 1 | 0 | 3 |
3 | 1804 | 21 | 1 | 0 | 4 |
3 | 1805 | 22 | 1 | 1 | 4 |
3 | 1806 | 23 | 1 | 0 | 4 |
4 | 1825 | 26 | 1 | 0 | 5 |
4 | 1826 | 27 | 1 | 0 | 5 |
4 | 1827 | 28 | 1 | 1 | 5 |
The value of nextboys, which will be randomly assigned within each group. Through this I can have a dataset simulating the real situation. A woman has a record of giving birth a boy in the exact year.
I would be much grateful for any suggestion and answer on this question! Thank you very much!
Comment