dtable with Proportions and their 95% Confidence Intervals for categorical varibles by(group variable)

robert seru

Join Date: Aug 2017
Posts: 36

dtable with Proportions and their 95% Confidence Intervals for categorical varibles by(group variable)

22 Feb 2025, 04:38

Greetings,
I would want generate a dtable that displays the proportion and it's 95% Confidence Intervals of the prorpotions for each categorical variable:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long marital_status2 byte age_first_sex long gender
3 17 1
2 17 1
1 16 1
2 16 2
2 18 2
2 19 1
2 14 2
2 18 2
2 20 2
2 17 1
end
label values marital_status2 marital_status2
label def marital_status2 1 "Never Married", modify
label def marital_status2 2 "Married/Living with partner", modify
label def marital_status2 3 "Divorced/Widow", modify
label values gender gender_list
label def gender_list 1 "male", modify
label def gender_list 2 "female", modify

Many Thanks
Robert

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#2

22 Feb 2025, 06:17

I assume that you mean you want a 95% CI of the proportions for each value of your categorical variable; while simple enough for a binary variable (e.g., gender), this is not straightforward for your marital status variable which has 3 categories for which the proportions must sum to 1 (at least, I assume they must); first you need to tell us how you want to form the CIs; a good starting place might be chapter 9 ("Methods for triads of proportions") in Newcombe, RG (2013), Confidence intervals for proportions and related measures of effect size, CRC Press; note that even for the binary variable, I'm not sure this can be done via dtable though it can certainly be done via collect and table
1 like
Comment
robert seru

Join Date: Aug 2017

Posts: 36
#3

23 Feb 2025, 21:46

Thank you Rich Goldstein for the reply.
Can you please assist me with how to do it via collect and table?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#4

24 Feb 2025, 01:47

What Rich said is that there are multiple ways (all wrong is some way) of creating such confidence intervals. So you first need to make a choice on what confidence interval you want. Only after that choice has been made, can we talk about implementing it.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

robert seru

Join Date: Aug 2017
Posts: 36

24 Feb 2025, 06:00

Maarten Buis My sincere apologies.
I'm yet to get access to the book Rich recommended.
Let me try and clarify with a more detailed example to better understand what Rich replied. Cause I may have misunderstood:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float period long maritalstatus2 int age_group long gender
0 2 3 1
0 1 6 1
0 0 1 1
0 1 5 2
0 1 3 2
end
label values period lbl_period
label def lbl_period 0 "Baseline", modify
label values maritalstatus2 lbl_marital
label def lbl_marital 0 "Never Married", modify
label def lbl_marital 1 "Married/Living with partner", modify
label def lbl_marital 2 "Divorced/Widow", modify
label values age_group age_group1
label def age_group1 1 "15-19", modify
label def age_group1 3 "25-29", modify
label def age_group1 5 "35-39", modify
label def age_group1 6 "40-54", modify
label values gender gender_list
label def gender_list 1 "Male", modify
label def gender_list 2 "Female", modify

I wanted to know if it's possible now with the collect command to produce something in this format with the proportions and 95% CIs 95% CI of the proportions for each value of your categorical variable in parenthesis:

	Baseline	Midline Endline	Total
N	2,596 (30.8%, xx%)	2,943 (34.9%, xx%) 2,902 (34.4%, xx%)	8,441 (100.0%, xx%)
Marital Status
Never Married	546 (21.0%, xx%)	618 (21.0%, xx%) 570 (19.6%, xx%)	1,734 (20.5%, xx%)
Married/Living	1,854 (71.4%, xx%)	2,060 (70.0%, xx%) 2,103 (72.5%, xx%)	6,017 (71.3%, xx%)
Divorced/Widow	196 (7.6%, xx%)	265 (9.0%, xx%) 229 (7.9%, xx%)	690 (8.2%, xx%)
Age Group
15-19	410 (15.8%, xx%)	459 (15.6%, xx%) 394 (13.6%, xx%)	1,263 (15.0%, xx%)
20-24	510 (19.6%, xx%)	594 (20.2%, xx%) 562 (19.4%, xx%)	1,666 (19.7%, xx%)
25-29	516 (19.9%, xx%)	523 (17.8%, xx%) 556 (19.2%, xx%)	1,595 (18.9%, xx%)
Sex
Male	1,250 (48.2%, xx%)	1,416 (48.1%, xx%) 1,403 (48.3%, xx%)	4,069 (48.2%, xx%)
Female	1,346 (51.8%, xx%)	1,527 (51.9%, xx%) 1,499 (51.7%, xx%)	4,372 (51.8%, xx%)

Thanks
Robert

Comment

Maarten Buis

Join Date: Mar 2014

Posts: 3456
#6

24 Feb 2025, 07:05

We tend to think of "the confidence interval", as if there is just one confidence interval. So then you get a question like yours: hey can I show "the confidence interval" for this set of mutually exclusive proportions. The problem with that is that that assumes that there is one definition for confidence intervals for a set of mutually exclusive proportions. Unfortunately that is not the case. So what you thought was just a display problem or a table problem, is actually a deep statistical problem. The thing you want to display is not defined yet by your question, and it is impossible to display undefined things. So you first need to figure out what confidence interval you want before you can actually display it.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#7

24 Feb 2025, 07:23

I agree with Maarten Buis and note that this also applies to binary categorical varaibles such as "sex" in #5 above (I had assumed above that you would want to show just one of the sexes and the problem does not arise there); I am not sure whether the following will be of help (as you haven't said why you want these), but you might want to look at Agresti, A, et al. (2008), "Simultaneous confidence intervals for comparing binomial parameters," Biometrics, 64: 1270-1275
Comment
robert seru

Join Date: Aug 2017

Posts: 36
#8

25 Feb 2025, 08:47

Thanks a lot. I now understand.
I thought I could compare the prevalence for each characteristic across the 3 periods and see whether there's any difference (checking to see if CIs overlap) rather than use the chi-square p-value.
I think I can employ some regression models instead with my baseline as the reference.
Thanks again
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#9

25 Feb 2025, 10:20

Originally posted by robert seru View Post

I thought I could compare the prevalence for each characteristic across the 3 periods and see whether there's any difference (checking to see if CIs overlap)

So on top of the problem that the confidence intervals are not well defined in your case, you have the problem that checking if confidence intervals overlap is not the same as a test for equality. See for instance this article by Andrew Gelman and Hal Stern in the American Statistician: https://doi.org/10.1198/000313006X152649

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#10

25 Feb 2025, 12:01

to expand on this a bit, note that while it is true that non-overlapping CI's do mean a p-value smaller than 1-CI_level (e.g., below .05 if it is a 95% CI), it is NOT true that overlapping CI's mean a p-value higher than 1-CI_level; here are two cites where the first is aimed at non-statisticians and the second includes theory on how much overlap can be expected:

Wolfe, R and Hanley, J (2002), "If we’re so different, why do we keep overlapping? When 1 plus 1 doesn’t make 2", Canadian Medical Assoc Journal, 166(1): 65-66

Schenker, N and Gentleman, JF (2001), "On Judging the Signiécance of Differences by Examining the Overlap Between Conédence Intervals", The American Statistician, 55(3): 182-186, DOI: 10.1198/000313001317097960
2 likes
Comment

Announcement

dtable with Proportions and their 95% Confidence Intervals for categorical varibles by(group variable)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment