Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I compare differences in demographic variables on a categorical variable?

    Hello,

    I am analyzing survey data that contains the following variables:
    - Gender (Male, Female)
    - Race (White, Black, Asian, Other, Decline to answer)
    - Ethnicity (Hispanic, Not Hispanic, Decline to answer)
    - Consent Status (Dropped during consent screen, yes, no)

    I have already recoded the string variables into byte and relabeled.

    I want to see if there are differences in consent status (yes, no) based on the demographic variables (gender, race, ethnicity). Are certain demographics more likely to consent yes (stat sig at p value 0.05) I'm able to run a ttest for gender, but for race and ethnicity, should I be running a chi2 test? Any guidance is really appreciated, thank you.

  • #2
    You can do each of them with -tab- and the -chi2- option

    Code:
    foreach v of varlist gender race ethnicity {
        tab `v' consent_status, chi2
    }
    If it turns out that some of the cells in some of the tables are very small, you can re-do this with Fisher exact instead.

    Comment


    • #3
      Asked over on Reddit.

      Please do let us know that you have cross-posted to other communities. You are asked to do so if you took the time to read our FAQ as a courtesy to others.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        You can do each of them with -tab- and the -chi2- option

        Code:
        foreach v of varlist gender race ethnicity {
        tab `v' consent_status, chi2
        }
        If it turns out that some of the cells in some of the tables are very small, you can re-do this with Fisher exact instead.
        Thank you! I ran chi2 since I have decent sample sizes for most of the variables. I also cleaned up consent to just show yes or no (I removed dropped at consent screen). It looks like there are no differences in consent yes or no based on race. Am I interpreting the output below correctly?

        Click image for larger version

Name:	Picture1.png
Views:	1
Size:	37.6 KB
ID:	1736682

        Comment


        • #5
          Originally posted by Leonardo Guizzetti View Post
          Asked over on Reddit.

          Please do let us know that you have cross-posted to other communities. You are asked to do so if you took the time to read our FAQ as a courtesy to others.
          Apologies, confirmed I have cross posted to Reddit.

          Comment


          • #6
            On the contrary, you have an association there. Let's use tabchii from tab_chi on SSC to get what you could get:

            Code:
            . tabchii 11693 10492 1407 624 310 \ 1256 831 138 43 21 , pearson
            
                      observed frequency
                      expected frequency
                      Pearson residual
            
            -----------------------------------------------------------------
                      |                          col                         
                  row |         1          2          3          4          5
            ----------+------------------------------------------------------
                    1 |     11693      10492       1407        624        310
                      | 11843.639  10356.438   1413.115    610.063    302.745
                      |    -1.384      1.332     -0.163      0.564      0.417
                      | 
                    2 |      1256        831        138         43         21
                      |  1105.361    966.562    131.885     56.937     28.255
                      |     4.531     -4.360      0.532     -1.847     -1.365
            -----------------------------------------------------------------
            
                     Pearson chi2(4) =  49.3087   Pr = 0.000
            likelihood-ratio chi2(4) =  49.8865   Pr = 0.000
            
            . ret li
            
            scalars:
                              r(N) =  26815
                              r(r) =  2
                              r(c) =  5
                           r(chi2) =  49.30869804531064
                              r(p) =  5.03400060587e-10
                        r(chi2_lr) =  49.88649232352966
                           r(p_lr) =  3.81338055457e-10
            Stata's telling you that P-value < 0.0005 but FWIW it is reported as less than 1e-9 (1/billion).

            More interesting are the Pearson residuals, each observed MINUS expected / root of expected, i.e. Pearson chi-square = sum of squared Pearson residuals. Others say YES more often and White people say YES less often than the null predicts, and so on. .

            Comment


            • #7
              Originally posted by Nick Cox View Post
              On the contrary, you have an association there. Let's use tabchii from tab_chi on SSC to get what you could get:

              Code:
              . tabchii 11693 10492 1407 624 310 \ 1256 831 138 43 21 , pearson
              
              observed frequency
              expected frequency
              Pearson residual
              
              -----------------------------------------------------------------
              | col
              row | 1 2 3 4 5
              ----------+------------------------------------------------------
              1 | 11693 10492 1407 624 310
              | 11843.639 10356.438 1413.115 610.063 302.745
              | -1.384 1.332 -0.163 0.564 0.417
              |
              2 | 1256 831 138 43 21
              | 1105.361 966.562 131.885 56.937 28.255
              | 4.531 -4.360 0.532 -1.847 -1.365
              -----------------------------------------------------------------
              
              Pearson chi2(4) = 49.3087 Pr = 0.000
              likelihood-ratio chi2(4) = 49.8865 Pr = 0.000
              
              . ret li
              
              scalars:
              r(N) = 26815
              r(r) = 2
              r(c) = 5
              r(chi2) = 49.30869804531064
              r(p) = 5.03400060587e-10
              r(chi2_lr) = 49.88649232352966
              r(p_lr) = 3.81338055457e-10
              Stata's telling you that P-value < 0.0005 but FWIW it is reported as less than 1e-9 (1/billion).

              More interesting are the Pearson residuals, each observed MINUS expected / root of expected, i.e. Pearson chi-square = sum of squared Pearson residuals. Others say YES more often and White people say YES less often than the null predicts, and so on. .
              Thank you. In my previous post, how were you able to determine there was an association? And whats the rationale for this follow up analysis you’ve done? Just want to understand correctly, thank you again

              Comment


              • #8
                Your chi-square result was clear-cut. Looking at residuals is discussed in many treatments of categorical data analysis.

                Comment

                Working...
                X