How do I compare differences in demographic variables on a categorical variable?

Serena Tee

Join Date: Dec 2023

Posts: 4
#1

How do I compare differences in demographic variables on a categorical variable?

10 Dec 2023, 12:27

Hello,

I am analyzing survey data that contains the following variables:
- Gender (Male, Female)
- Race (White, Black, Asian, Other, Decline to answer)
- Ethnicity (Hispanic, Not Hispanic, Decline to answer)
- Consent Status (Dropped during consent screen, yes, no)

I have already recoded the string variables into byte and relabeled.

I want to see if there are differences in consent status (yes, no) based on the demographic variables (gender, race, ethnicity). Are certain demographics more likely to consent yes (stat sig at p value 0.05) I'm able to run a ttest for gender, but for race and ethnicity, should I be running a chi2 test? Any guidance is really appreciated, thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

10 Dec 2023, 13:01

You can do each of them with -tab- and the -chi2- option

Code:

foreach v of varlist gender race ethnicity { tab `v' consent_status, chi2 }

If it turns out that some of the cells in some of the tables are very small, you can re-do this with Fisher exact instead.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#3

10 Dec 2023, 13:54

Asked over on Reddit.

Please do let us know that you have cross-posted to other communities. You are asked to do so if you took the time to read our FAQ as a courtesy to others.
1 like
Comment
Serena Tee

Join Date: Dec 2023

Posts: 4
#4

10 Dec 2023, 13:58

Originally posted by Clyde Schechter View Post

You can do each of them with -tab- and the -chi2- option

Code:

foreach v of varlist gender race ethnicity { tab `v' consent_status, chi2 }

If it turns out that some of the cells in some of the tables are very small, you can re-do this with Fisher exact instead.

Thank you! I ran chi2 since I have decent sample sizes for most of the variables. I also cleaned up consent to just show yes or no (I removed dropped at consent screen). It looks like there are no differences in consent yes or no based on race. Am I interpreting the output below correctly?
Comment
Serena Tee

Join Date: Dec 2023

Posts: 4
#5

10 Dec 2023, 14:08

Originally posted by Leonardo Guizzetti View Post

Asked over on Reddit.

Please do let us know that you have cross-posted to other communities. You are asked to do so if you took the time to read our FAQ as a courtesy to others.

Apologies, confirmed I have cross posted to Reddit.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35433

10 Dec 2023, 14:16

On the contrary, you have an association there. Let's use tabchii from tab_chi on SSC to get what you could get:

Code:

. tabchii 11693 10492 1407 624 310 \ 1256 831 138 43 21 , pearson

          observed frequency
          expected frequency
          Pearson residual

-----------------------------------------------------------------
          |                          col                         
      row |         1          2          3          4          5
----------+------------------------------------------------------
        1 |     11693      10492       1407        624        310
          | 11843.639  10356.438   1413.115    610.063    302.745
          |    -1.384      1.332     -0.163      0.564      0.417
          | 
        2 |      1256        831        138         43         21
          |  1105.361    966.562    131.885     56.937     28.255
          |     4.531     -4.360      0.532     -1.847     -1.365
-----------------------------------------------------------------

         Pearson chi2(4) =  49.3087   Pr = 0.000
likelihood-ratio chi2(4) =  49.8865   Pr = 0.000

. ret li

scalars:
                  r(N) =  26815
                  r(r) =  2
                  r(c) =  5
               r(chi2) =  49.30869804531064
                  r(p) =  5.03400060587e-10
            r(chi2_lr) =  49.88649232352966
               r(p_lr) =  3.81338055457e-10

Stata's telling you that P-value < 0.0005 but FWIW it is reported as less than 1e-9 (1/billion).

More interesting are the Pearson residuals, each observed MINUS expected / root of expected, i.e. Pearson chi-square = sum of squared Pearson residuals. Others say YES more often and White people say YES less often than the null predicts, and so on. .

Comment

Serena Tee

Join Date: Dec 2023
Posts: 4

10 Dec 2023, 15:14

Originally posted by Nick Cox View Post

On the contrary, you have an association there. Let's use tabchii from tab_chi on SSC to get what you could get:

Code:

. tabchii 11693 10492 1407 624 310 \ 1256 831 138 43 21 , pearson

observed frequency
expected frequency
Pearson residual

-----------------------------------------------------------------
| col
row | 1 2 3 4 5
----------+------------------------------------------------------
1 | 11693 10492 1407 624 310
| 11843.639 10356.438 1413.115 610.063 302.745
| -1.384 1.332 -0.163 0.564 0.417
|
2 | 1256 831 138 43 21
| 1105.361 966.562 131.885 56.937 28.255
| 4.531 -4.360 0.532 -1.847 -1.365
-----------------------------------------------------------------

Pearson chi2(4) = 49.3087 Pr = 0.000
likelihood-ratio chi2(4) = 49.8865 Pr = 0.000

. ret li

scalars:
r(N) = 26815
r(r) = 2
r(c) = 5
r(chi2) = 49.30869804531064
r(p) = 5.03400060587e-10
r(chi2_lr) = 49.88649232352966
r(p_lr) = 3.81338055457e-10

Thank you. In my previous post, how were you able to determine there was an association? And whats the rationale for this follow up analysis you’ve done? Just want to understand correctly, thank you again

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35433
#8

10 Dec 2023, 15:40

Your chi-square result was clear-cut. Looking at residuals is discussed in many treatments of categorical data analysis.
Comment

Announcement