Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Singleton cases - diff-in-diff model

    I am running a difference-in-differences model via Stata 12 using the command diff. The majority of my data is singleton cases. The other entities are observed for 2 years. When I remove singleton cases, the coefficients vastly change. I would highly appreciate your advice re if I should include or remove singletons before running my model.

  • #2
    This really should be in the General forum, not the forum for learning to use the forum software. But I'll answer it here.

    There are a number of reasons you may be observing a large shift in your coefficients when you delete the singleton cases. Since they are numerous, the sample that remains when you remove them is small and may be underpowered, so your estimates may have poor precision. But even more important, it is possible, and depending on the context of your research, even likely, that there is something different about the cases that only have a single observation. The fact that they weren't around, or didn't want to contribute data, or were difficult to reach on one of the data collection occasions may well be in some way connected to the very phenomena you are trying to study with your model. So it may well be that by removing them, you are uncovering the fact that there is a real difference between them and the others. Just why and how this might happen would be a question of the subject matter you are studying, not a statistical question.

    So you need to look into that. One thing I would do is study separately the distributions of all of your model variables in the singletons and in the fully-followed cases. You may well find important differences. Graphical exploration may prove very helpful in this connection. Then you need to decide, based on the science of your study, whether such differences mean that the two populations are so different that a single model for both is inappropriate. If so, then you should do separate models. (Or a combined model that includes an indicator for singleton status and interaction terms between that indicator and the key predictors.) If, however, your exploration shows that the two groups are basically the same, then it is probably just a power and precision issue, and you would be better off running your model with inclusion of all the data.

    If you need to continue this thread beyond this discussion, I suggest that you take it to the General Forum and include a link to this thread in your post. Very few people will get to see it in this Forum.

    Comment


    • #3
      I cannot thank you enough for your elaboration and I apologize for posting my question to the wrong forum. It is my first time to post a question to statalist.org. I shall post future questions to the general forum. Thank you again for your guidance.

      Comment

      Working...
      X