Hello, everyone
I am seeking help with how I might be able to clean the panel data. The data is from China Family Panel Studies (CFPS) and I am trying to study a problem related to parents and children. Below is a simplified table taken from the original dataset as an example:
Let me briefly explain this example:
In this case, there are three people which pid=personal ID. Each of them belongs to a certain family where fid=family ID. The first two of them have five children, and the last one only has four children (in the original dataset, the number of children varies from 0-10). For each child, they have a personal ID code (code_c#), information on their birth year (birthy_c#), their birth month (birthm_c#), and their gender (gender_c#). If the content of a cell is a dot, that means lack of information because they don't know/they don't want to answer/the case is not suitable. Also, for gender, 0=female and 1=male. We can also see that there is often a lack of data where there should be information.
The biggest problem with the original data is, these children (child_c1,child_c2,child_c3,child_c4,child_c5) are not in an order in any way. I would wish for them to be able to be ordered according to age from small to big, so that I would be able to gain information on the gap between two births and problems related to hidden gender preference.
Here is an ideal form of the above table manually altered by me:
My ideal format is that the children can be ordered according to first birth year and then birth month from small to big, and that the information of their personal id (code_c#) and gender and also move along with them.
The difficulty id first that I am not sure if there is a command in Stata that can swap the data in two cells. In addition, the command as far as I know from searching on the internet can only able me to move a column as a whole, but cannot treat the columns of each row individually. Furthermore, I am not sure how I can enable the information of a child to move with the birth year and birth month.
It would be of a million thanks if there is a way to solve this issue
I am seeking help with how I might be able to clean the panel data. The data is from China Family Panel Studies (CFPS) and I am trying to study a problem related to parents and children. Below is a simplified table taken from the original dataset as an example:
pid | fid | code_c1 | birthy_c1 | birthm_c1 | gender_c1 | code_c2 | birthy_c2 | birthm_c2 | gender_c2 | code_c3 | birthy_c3 | birthm_c3 | gender_c3 | code_c4 | birthy_c4 | birthm_c4 | gender_c4 | code_c5 | birthy_c5 | birthm_c5 | gender_c5 |
410568105 | 108998 | 205 | 1969 | 1 | 0 | 206 | 1971 | 5 | 0 | 207 | 1972 | 3 | 1 | 208 | 1966 | 7 | 1 | 102 | 1966 | 7 | 0 |
100435552 | 100435 | 102 | 1952 | 11 | 1 | 208 | 1954 | 12 | 0 | 206 | 1954 | 1 | . | 207 | 1957 | 9 | 0 | 205 | 1956 | . | 1 |
100879601 | 100879 | 101 | 1968 | 3 | 1 | 207 | . | 6 | 1 | 209 | . | . | 0 | 210 | 1971 | 4 | . | . | . | . | . |
In this case, there are three people which pid=personal ID. Each of them belongs to a certain family where fid=family ID. The first two of them have five children, and the last one only has four children (in the original dataset, the number of children varies from 0-10). For each child, they have a personal ID code (code_c#), information on their birth year (birthy_c#), their birth month (birthm_c#), and their gender (gender_c#). If the content of a cell is a dot, that means lack of information because they don't know/they don't want to answer/the case is not suitable. Also, for gender, 0=female and 1=male. We can also see that there is often a lack of data where there should be information.
The biggest problem with the original data is, these children (child_c1,child_c2,child_c3,child_c4,child_c5) are not in an order in any way. I would wish for them to be able to be ordered according to age from small to big, so that I would be able to gain information on the gap between two births and problems related to hidden gender preference.
Here is an ideal form of the above table manually altered by me:
pid | fid | code_c1 | birthy_c1 | birthm_c1 | gender_c1 | code_c2 | birthy_c2 | birthm_c2 | gender_c2 | code_c3 | birthy_c3 | birthm_c3 | gender_c3 | code_c4 | birthy_c4 | birthm_c4 | gender_c4 | code_c5 | birthy_c5 | birthm_c5 | gender_c5 |
410568105 | 108998 | 102 | 1966 | 7 | 0 | 208 | 1966 | 7 | 1 | 205 | 1969 | 1 | 0 | 206 | 1971 | 5 | 0 | 207 | 1972 | 3 | 1 |
100435552 | 100435 | 102 | 1952 | 11 | 1 | 208 | 1954 | 1 | . | 206 | 1954 | 12 | 0 | 205 | 1957 | . | 1 | 207 | 1956 | 9 | 0 |
100879601 | 100879 | 101 | 1968 | 3 | 1 | 210 | 1971 | 4 | . | 207 | . | 6 | 1 | 209 | . | . | 0 | . | . | . | . |
The difficulty id first that I am not sure if there is a command in Stata that can swap the data in two cells. In addition, the command as far as I know from searching on the internet can only able me to move a column as a whole, but cannot treat the columns of each row individually. Furthermore, I am not sure how I can enable the information of a child to move with the birth year and birth month.
It would be of a million thanks if there is a way to solve this issue
data:image/s3,"s3://crabby-images/3b2df/3b2dffd38c878957adb33e14fd6da75280bb9ccc" alt="Smile"
Comment