Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on Multiple Imputation

    Hello,

    I have a question with regard to multiple imputation in Stata. The aim of my research is to explain, why people choose different educational pathways. Option A would be to do an apprenticeship (Vocational Education and Training), option B is to continue school in order to receive the university entrance qualification. I want to know to what extent the expected benefits and costs of both options explain why people choose option A or B.

    The problem is that only those respondents (60 % of the sample) who said that they wanted to apply for an apprenticeship were asked about the expected benfits and costs of an apprenticeship. So for 40 % of the sample (respondents who said that they did not want to apply for an apprenticeship) there is no information available on the expected benfits and costs of an apprenticeship. Does it make sense and is it possible to impute the missing data (expected benfits and costs of an apprenticeship) for this group?

    Thank you very much for your help!
    Last edited by Robin Busse; 16 Mar 2020, 04:35.

  • #2
    Hi Robin,

    multiple imputation is usually based on the missing (completely) at random assumptions (MAR/MCAR), the situation you are describing sounds more like NMAR. Have you considered a Heckman selection model maybe? (Haven't thought too hard about this suggestion, I admit.)

    Best
    Nora

    Comment


    • #3
      Robin:
      welocme to this forum.
      You should specify the reason why 40% of your sample reported missing values on those items. Put differently, is their missingness ignorable or not? (see https://onlinelibrary.wiley.com/doi/.../9781119013563 page 119).
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Dear Nora and Carlo,
        thank you very much for your help! I read bit into the literature and I would be very thankful if you could help me clarify some things.

        First, I would like to specify the reason why 40 % of my sample reported missing values on those items. The reason is a filter in the questionnaire. In the first question the respondents were asked if they do intend to apply for an apprenticeship (Var1). If they answered with yes (Var1=1), they were asked about the expected benefits (Var2) and costs (Var3) of the apprenticeship they intended to apply for. If they answered the first question (Var1 'Do you intend to apply for an apprenticeship') with no (Var1=0), they were not asked about the expected benefits benefits (Var2) and costs (Var3) of the apprenticeship. So the reason why 40 % of my sample do not have values on Var2 and Var3 is a filter in the questionnaire.

        Am I right to say that the missingness of Var2 and Var3 is not ignorable?

        Kind regards,
        Robin

        Comment


        • #5
          Robin:
          my guess is that you have something similar to an hurdle problem (see -churdle- entry in Stata .pdf manual).
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Thank you Carlo!

            I guess I have to correct myself and I´m sorry for the confusion. I just found out something that might change the picture. Somehow there is data available on Var2 and Var3 even if the respondents answered that they did not plan to apply for an apprenticeship. The following cross-table shows that for 661 respondents (214+447), who reported not to apply for an apprenticeship, information on the expected benefit (Var 2) is available. My guess is that because data collection took place via PAPI and the instruction to skip the questions on Var2 and Var3 is not clear, most of the respondents without an intention to apply for an apprenticeship answered the questions regarding Var2 and Var3.

            I´m working with imputed data (MICE in Stata) and now I´m a little bit confused if I can assume MAR for var1 and var2 and integrate them in my chained equation model.


            Unbenannt.JPG

            Again I´m very sorry for the confusion!

            Best wishes from Germany,
            Robin

            Comment


            • #7
              Robin:
              the issue about diagnosing the missing mechanism being MAR (or not) depends on your missing values fitting (or not) the contents of the following definition (quoted from -help mi_glossary-):
              missing at random. Missing data are said to be missing at random (MAR) if the probability that data are missing does not depend on unobserved data but may depend on observed data. Under MAR, the
              missing-data values do not contain any additional information given observed data about the missing-data mechanism. Thus the process that causes missing data can be ignored.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment

              Working...
              X