Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maximum number of values for a categorical variable.

    What is the maximum number of values for a categorical variable for a chi-square test? For example, I have a categorical variable for location. There are 68 values. When I run the chi-square test, it says too many values. I can condense the values, but I'm trying to figure out how much I need to condense them by. Thank you!
    Last edited by Jessica Berrett; 19 Mar 2023, 08:23.

  • #2
    You don't show the actual code that produced this message, but the only context in which I have ever seen it is when using the -tab- command for a two-way cross tab. The issue is actually just that the -tab- command has a limit on the number of values it will work with. Moreover, that limit applies only to the first variable in a tab command. If your other variable has sufficiently few levels, running the -tab- command with the order of the two variables reversed will get you the result you want. It is not actually a limitation on the number of levels of the variable.

    If both of the variables have too many levels for -tab-, then you can still get a chi square from -mlogit-. But in this case, I would have to ask you why you are even doing this. While, in principle, the chi square test for independence has no limit on the number of variables, if you have very large numbers of levels of both variables, your cross-tab will be sparse unless you are working with a very large data set. And in that case the use of a chi-square is questionable.
    Last edited by Clyde Schechter; 19 Mar 2023, 09:40.

    Comment


    • #3
      assuming that you used the -tabulate- command, the following are the limits (from "h limits"):
      Code:
        tabulate  
             # of rows in one-way table             3,000             12,000
             # of rows & cols in two-way table     300x20           1,200x80
      note that the first column above is for Stata/BE while the second is for Stata/MP or Stata/SE - if you are using BE, then you have a problem; but if you are using MP or SE, then you need to supply more information; the FAQ has very good advice on how to ask questions and what info to supply to help people respond more quickly and more accurately

      Comment


      • #4
        I am using StataBE Version 17.

        The code I am using is: tabulate Q29Wouldyourorgacceptadona Q50ACity, chi2

        I also tried: tab Q29Wouldyourorgacceptadona Q50ACity, chi2

        The error code I receive is "too many values."

        Comment


        • #5
          I will add a little, more or less in the spirit of Clyde Schechter 's reply, or so I hope.

          A chi-square test can answer a very focused question if it is for say a 2 x 2 table. With much larger tables, what is likely to be as or more important than the overall result is the pattern of residuals. I don't think I've ever found much use in such a test with more than about 30 degrees of freedom, but stories to the contrary would be interesting.

          More positively -- no data example yet -- it seems likely that Q29 has relatively simple answers but as Q58 has 68 (distinct) values, it might be useful to order the cities somehow and look at the extremes in that ordering especially.

          Comment

          Working...
          X