Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect of msising data

    Hi all,

    I want to look at the effect of policy on the change in working hours for a certain age group. I have LFSdata split into quarters and I have managed to create pre and post policy variations of the variable I am looking at. However, I have more observations post policy than I do pre policy and so I have 100 or so missing data points. I have included a picture to give you an idea of what it looks like. I was wondering whether this would impact the values stata gives for descritpive statistics e.g. mean, standard deviation etc. Would it be adviseable to just ignore the 100 extra data points post policy and have equal sized groups?
    Click image for larger version

Name:	Annotation 2020-03-17 105913.png
Views:	1
Size:	1.3 KB
ID:	1541627

  • #2
    Luke:
    Stata applies listwise deletion to missing observations: hence, they are excluded from statistics.
    Ignoring missing vaues altogether might bias your analyses, especially if missingness is not igniorable.
    You should first diagnose which is the mechanism that underlies missing data (see -mi- entries in Stata .pdf manual).
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      To add to Carlo's helpful comments, you will increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using datex. Note that ignorable generally translates into missing completely at random or missing in ways unrelated to the variables..

      The problem is that non-random missing data can strongly change your descriptive statistics or any other use of the data. Just think what would happen if smaller values of the variable are more likely to be missing than larger. Obviously, you would estimate the mean too high. However, if the issue is that you have 50 usable observations before and 50 usable right after but then later you have some missing data, then I suspect you could work with the 50 before and 50 after and ignore the missing later.

      Comment

      Working...
      X