gathering missing information from one date module to another

Martin von Brandt

Join Date: Jan 2020

Posts: 21
#1

gathering missing information from one date module to another

17 Feb 2022, 09:09

Hello everyone,

I have data with 8 waves. Each wave has numerous modules covering respondents' health, children, finances etc. Some of the information, esp. demographics like gender, marital status, employment status, is missing because the respondents did not answer questions since some of these variables are stable over the years (waves). Or sometimes there are 2 respondents from one household and let's say in wave 1, respondent #1 answers a question and the answers from respondent #2 are filled in as missing values and in wave 2, it is the other way around. I would like to get information from wave 1 to wave 2 .... to wave 8 in order to have a dataset that has not a lot of missing values. I also would like to follow respondents from wave1 till wave 8. However, I do not know how to do this kind of connection or information gathering.

I appreciate if someone can help me with this. Thanks in advance!
Tags: None
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#2

17 Feb 2022, 10:44

please show an example of your data (that illustrates the problem) using the -dataex- command
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

17 Feb 2022, 11:07

In your previous topic at

https://www.statalist.org/forums/for...ual-level-data

I suggested that the key to being able to "follow respondents from wave1 till wave 8" was to include a "cross-wave individual identifier" provided by the survey that is, for any individual, the same value across every wave of the survey in which they appear, even if they change households (for example, divorce) and is different from the identifier of every other individual.

In providing example data, as you did in your previous topic, please be sure to include whatever you are using for an individual identifier. Also, if might be useful for us to know if your survey data is from a publicly available source, such as the US Panel Study of Income Dynamics, with which some users might have specific experience relevant to your analytic aims..
Comment

Martin von Brandt

Join Date: Jan 2020
Posts: 21

17 Feb 2022, 13:09

Thanks a lot for your answers. The data is called SHARE dataset and unfortunately it is not publicly available. They allow people/researchers to access the data via permission. I will provide dataex to create a concrete example. In the first data you see the children module from wave 6, meaning that the parents are giving information on their children. ch001=number of children and ch001 to ch020 give the information on gender. All identifiers are listed below as well. Mergeid is the individual identifier and hhid is the household identifier. If one person answers a question (like AT-001492-01) the other person's answers from the same household (AT-001492-02) are coded as missing value. I cannot foresee if that would be a problem at some point...

If you look at AT-001881-01 and AT-001881-02 in wave 6, you can see that they answered how many children they have and about the gender. However, If you look at the second dataex, which is from wave 7, both number of children and gender info is coded as missing.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 mergeid str11 hhid6 str12 mergeidp6 str15 coupleid6 byte(ch001_ ch005_1 ch005_2 ch005_3 ch005_4 ch005_5 ch005_6 ch005_7 ch005_8 ch005_9 ch005_10 ch005_11 ch005_12 ch005_13 ch005_14 ch005_15 ch005_16 ch005_17 ch005_18 ch005_19 ch005_20)
"AT-000674-01" "AT-000674-A" ""             ""                2 1 1 . . . . . . . . . . . . . . . . . .
"AT-001215-01" "AT-001215-A" ""             ""                0 . . . . . . . . . . . . . . . . . . . .
"AT-001492-01" "AT-001492-A" "AT-001492-02" "AT-001492-01-02" 6 2 2 2 1 2 1 . . . . . . . . . . . . . .
"AT-001492-02" "AT-001492-A" "AT-001492-01" "AT-001492-01-02" . . . . . . . . . . . . . . . . . . . . .
"AT-001881-01" "AT-001881-A" ""             ""                3 1 1 2 . . . . . . . . . . . . . . . . .
"AT-001881-02" "AT-001881-B" ""             ""                2 1 . . . 1 . . . . . . . . . . . . . . .
end
label values ch001_ dkrf
label values ch005_1 gender
label values ch005_2 gender
label values ch005_3 gender
label values ch005_4 gender
label values ch005_5 gender
label values ch005_6 gender
label values ch005_7 gender
label values ch005_8 gender
label values ch005_9 gender
label values ch005_10 gender
label values ch005_11 gender
label values ch005_12 gender
label values ch005_13 gender
label values ch005_14 gender
label values ch005_15 gender
label values ch005_16 gender
label values ch005_17 gender
label values ch005_18 gender
label values ch005_19 gender
label values ch005_20 gender
label def gender 1 "Male", modify
label def gender 2 "Female", modify

Wave 7 example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 mergeid str11 hhid7 str12 mergeidp7 str15 coupleid7 byte(ch001_ ch005_1 ch005_2 ch005_3 ch005_4 ch005_5 ch005_6 ch005_7 ch005_8 ch005_9 ch005_10 ch005_11 ch005_12 ch005_13 ch005_14 ch005_15 ch005_16 ch005_17 ch005_18 ch005_19 ch005_20)
"AT-001215-01" "AT-001215-A" ""             ""                . . . . . . . . . . . . . . . . . . . . .
"AT-001492-01" "AT-001492-A" "AT-001492-02" "AT-001492-01-02" . . . . . . . . . . . . . . . . . . . . .
"AT-001492-02" "AT-001492-A" "AT-001492-01" "AT-001492-01-02" . . . . . . . . . . . . . . . . . . . . .
"AT-001881-01" "AT-001881-A" ""             ""                . . . . . . . . . . . . . . . . . . . . .
"AT-001881-02" "AT-001881-B" ""             ""                . . . . . . . . . . . . . . . . . . . . .
end
label values ch001_ dkrf
label values ch005_1 gender
label values ch005_2 gender
label values ch005_3 gender
label values ch005_4 gender
label values ch005_5 gender
label values ch005_6 gender
label values ch005_7 gender
label values ch005_8 gender
label values ch005_9 gender
label values ch005_10 gender
label values ch005_11 gender
label values ch005_12 gender
label values ch005_13 gender
label values ch005_14 gender
label values ch005_15 gender
label values ch005_16 gender
label values ch005_17 gender
label values ch005_18 gender
label values ch005_19 gender
label values ch005_20 gender
label def gender 1 "Male", modify
label def gender 2 "Female", modify

I need to run a general code that I can use starting from wave 2 till wave 8 and pick up the missing information and filling the missing values in. I need to do this for the modules that I need to use like children, demographics of respondents, healthcare status etc. So if you can give me an example of how to do this I can adjust the code for different variables and modules.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

17 Feb 2022, 17:48

Thank you for the example data, and for the reference to the "SHARE" survey. From that I was able to find that the survey is the Survey of Health, Ageing and Retirement in Europe with a very informative web site at http://www.share-project.org.

You have encountered the problem discussed in the Data Documentation FAQ 5.8 at http://www.share-project.org/data-do...tion/faqs.html
For longitudinal analyses on children users cannot rely on the order of the children in the CH module. It is necessary to match them on gender and year of birth - this will lead to correct merges in most cases. There are a couple of reasons behind this. First, respondents are supposed to report on their children in a defined order, but they may not necessarily do so. Second, partners may change and respondents always are supposed to report on both partners´ children. Third, you can never exclude reporting errors.
So looking back to my post #2, this is bad news because SHARE provides no identifier that tracks children over time. And the suggestion of matching on gender and year of birth seems inconsistent with the lack of child gender data in wave 7.

This has moved from the realm of Stata syntax and the like into understanding SHARE data and how to use it to meet your needs. That is, you need to first figure out exactly what you would need to do by hand to match children across waves. It would be interesting to understand how this is to be done if gender is not collected in a given wave. Then once you know what you need to do, set about implementing it in Stata. Right now, asking for code is premature.

I'm afraid I've reached the end of my abilities here. Perhaps someone else familiar with SHARE will offer advice. Or perhaps if you contact SHARE User Support at the address on their website they will be able to give you more detailed guidance.
Comment
Martin von Brandt

Join Date: Jan 2020

Posts: 21
#6

18 Feb 2022, 11:36

I will try to figure out something. Thanks for your time
Comment

Announcement