Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • T-test or chi-square test

    I am working with a sample which includes only participants that have complete information (i.e a complete case analysis). I want to compare variables in my sample to variables in the drop out sample to see if samples are similar or different. I want to compare social class and gender (categorical variables) from the complete case sample with the same variables in the drop out sample. What test do I use for this please, will it be a t-test even though variables are categorical?

  • #2
    For dichotomous variables it makes little difference which of these two tests you choose. They will give approximately the same results. If a categorical variable has more than 2 levels, however, it would be wrong to use a t-test because you would be treating the numbers assigned to the different levels as if they had numeric meaning. If you have a mixture of both dichotomous and polytomous variables, using the chi square would lead to a simpler presentation of findings because the same test would be used for all of the categorical variables.

    That said, I would urge you not to think too much about this kind of statistical testing, as it is an answer to the wrong question. The issue you are facing is that some data are missing, and you have to be concerned that the missingness of the data has arisen in such a way that it may distort the results of your subsequent analyses. No statistical hypothesis tests will tell you anything useful about this issue. Let me elaborate.

    First, the null hypothesis that the data are missing completely at random is a straw man. Pretty much the only way this could happen is if the data forms caught on fire but did not burn completely and you could salvage some data from the questionnaires but not other data. Or something equally far-fetched. Other than that, missingness of data almost always has a non-random component.

    Second, statistical hypothesis tests are used to infer whether a difference exists at the population level from what is observed at the sample level. But the issue for you is not whether at some abstract population level people with complete data resemble people with incomplete data. You need to know whether the actual differences in your sample matter to your proposed analyses. Hypothesis tests provide no information at all about that. (That is why I say that they answer the wrong question.)

    So focus, instead, on how much data is missing for each variable, and whether the differences on things that you have for both the complete and incomplete data are large enough to substantially distort your main analyses. And, most important, how will you deal with the missingness. You might want to look at https://statisticalhorizons.com/wp-c...aterials-1.pdf for a good explanation of approaches to this problem.

    Comment


    • #3
      I generally agree very much with Clyde Schechter but there is one place where he is, maybe, a little too strong - if the multiple categories are ordinal, you might want to see, Heeren, T and D'Agostino, R, (1987), "Robustness of the two independent samples t-test when applied to ordinal scaled data", Statistics in Medicine, 6: 79-90; however even this result does not affect Clyde's main points

      Comment


      • #4
        Rich Goldstein is right. I was thinking about variables that are purely categorical with no ordinal properties, like, say, race or religious affiliation. For categorical variables that are ordered, the t-test can be appropriate.

        Comment


        • #5
          Thanks for the comments, I really appreciate them.

          Comment

          Working...
          X