Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for sample bias (unbalanced panel data set)

    Dear users,

    I am using Stata 15.0. I have an unbalanced panel data set with 10,402 observations. The data set contains multiple continuous variables. The data set is result of merging different data sets and dropping missing observations with missing values.

    Therefore, my thesis coach has asked me to perform a test showing whether my data set contains a sample bias. I am asked to show whether my research is limited by an inherent bias rather than to find a solution to the bias.

    I have seen several solutions. One of them includes performing an ANOVA test to see whether variables in my current, merged, data set are significantly different from those in the initial dataset. However, I have also read I could perform a paired t-test.

    Does anyone have experience with such a procedure? I am very curious for you opinion.

    Also, in case you recommend a certain procedure? How do I proceed? By merging my merged dataset with initial datasets again and test variables after merge and before?

    I am looking forward to your reaction. Thank you in advance.

  • #2
    I would say none of the above.

    I cannot imagine how a paired t-test would work here, so I won't say anything more about it. An unpaired t-test is no different from a one-way anova in your situation, however.

    A comparison of your selected sample with the original data, or with the excluded data is the way to go here. For the purpose of assessing bias and how it might affect subsequent analyses, however, statistical tests are the wrong approach--they answer the wrong question. A very small difference in the distribution of one variable might fail to be statistically significant, yet if that variable is very strongly associated with other things you will analyze, that small difference might materially distort your analyses. Similarly a large and highly statistically significant difference in the distribution of a variable might nevertheless be ignorable if that variable is more or less independent of anything else that figures in your analyses. In any case, the p-value of a t-test or ANOVA or a contingency table is not informative for this purpose.

    Probably the simplest way for you to do this is to -append- the original data set to the selected sample, being sure to specify the -generate()- option so you know which observations came from where. Then you can use -summarize- or -tabulate-, depending on the type of variable, to compare the distributions in the two data sets. The judgment as to whether a difference is large enough to be worrisome depends on more difficult considerations such as the salience of those variables to the planned analysis.

    Sometimes, in the end, the only good assessment of bias is to run the analysis separately on each data set and see if the results look materially different.

    Comment


    • #3
      Dear Clyde,

      Thank you very much for your helpful reaction. It really added to my understanding of available procedures.

      Sometimes, in the end, the only good assessment of bias is to run the analysis separately on each data set and see if the results look materially different.
      The problem is that I have dropped observations for which I could not find independent variables. Therefore, unfortunately, I am not able to run separate analyses.

      My MSc thesis coach has really been pressing me to conduct a test showing whether differences between sample and dropped values are different. Therefore, is there any suboptimal test you could recommend in that case? Something to add to the interpretation of selection bias, however non-conclusive perhaps?

      Again, thank you very much for the help.

      Comment


      • #4
        Sometimes folks do t-tests to show that the dropped and undropped observations do not differ significantly on the non-missing variables. This can be done with t-tests.

        Comment


        • #5
          Thank you for the reaction. Now, I have made the remark that the tests provide no conclusive statistical evidence to infer whether or not my sample contains a bias. However, it adds to the interpretation on whether the sample contains a bias. Would you agree with this argumentation?

          Comment

          Working...
          X