Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help in handling missing data

    I have a longitudinal dataset, and in that, I have a missing value for some of the households in my dataset in the paternal or maternal education column because one of them might have died during the years. I want to include these two in my model as control variables so can anyone suggest what should I do taking them as they are now will make the model ignore the complete row and will not consider the complete household for the regression I was thinking of adding a dummy variable taking value 1 if the father or mother education is missing and include it in the model will it solve the problem ? If not, then please share your opinions.

  • #2
    I was thinking of adding a dummy variable taking value 1 if the father or mother education is missing and include it in the model will it solve the problem ?
    No, this is not a good approach. The problem is that the new variable is an unknown mixture of people with different levels of education, and its introduction biases the estimate of the education variables' effects, as well as being essentially uninterpretable in its own right.

    Take a look at https://statisticalhorizons.com/wp-c...aterials-1.pdf for a discussion of different approaches to missing data and examples using Stata. It's not an easy read, so allocate sufficient time to it. I suspect that using multiple imputation will be the best approach for this particular problem, assuming that your data set is sufficiently rich in other variables that are related to maternal and paternal education. Using multiple imputation is technically somewhat tricky, so if you go this route, allocate plenty of time to learning the technique from the Stata user manuals (that are PDF files that come with your Stata installation).

    Comment


    • #3
      yes actually the problem is that for the HH members where paternal and maternal education information is not present in most cases, they might have died so the other variables information of these parents are also not available

      Comment


      • #4
        So, are you saying that you have no parental information at all if the parent is deceased? Perhaps multiple imputation might still be viable if there are child or household level variables that are correlated with (the unknown) parental education. Or, if not, you might need to do separate analyses for the HH where the parents have died and for the HH where the parents are still alive, and then, perhaps, combine the results where they are similar. Or maybe you need to omit parental variables from your analyses altogether.

        I don't feel comfortable, however, making these recommendation or any other. This is a difficult situation and I think it calls for a review of all of the data by somebody who also has a good understanding of the research questions and the likely real-world data-generating processes here. I hope that somebody who has greater expertise in this area of study will join the thread and give some advice. To facilitate that happening, it would be very helpful if you gave a description of what data are available to you, how many time periods, children and households are in your data set, and what your research goals are, as well as some quantitative information about the amount of missing data.

        Alternatively, you might be better served by consulting with your advisor or supervisor in your school or workplace (as the case may be) so that you can have a more efficient, interactive discussion with somebody who can sit down and spend some serious time working on this with you.

        Comment


        • #5
          Thanks for the instant response, sir. I will give you a brief overview of my data set. I have an HH survey dataset with hh ids and mem ids representing hh and members, and these members are children of the age group 6-17 I have three time periods: one is pre-treatment, and two is post-treatment, and my treatment is a negative income shock, and my other variables are child dropout which is my output variable and the other independent variables are education exp of hh, number of dependents in hh, the total number of members in hh, assets owned by the hh and the paternal and maternal education of the parents if they parent is deceased there is information available on these variables but I don't know how correlated will these variables be to parents education.
          You can provide your suggestions, and I am open to them.

          Comment


          • #6
            Well, I cannot speak authoritatively on these matters because I am not an expert in education. I don't know the location in which your study is being carried out, but unless it is the United States, I also can't pretend to know much about socioeconomic patterns in education that prevail there. If this study is in the United States, I would expect there to be rather strong associations among hh assets, number of dependents, (and perhaps educational exp of hh, whatever that may mean distinct from paternal and maternal education) and paternal and maternal education. So at least in this context, I would be comfortable with a multiple imputation approach. Whether that applies to the locus of your study, I can only guess.

            Comment

            Working...
            X