Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dtable with Proportions and their 95% Confidence Intervals for categorical varibles by(group variable)

    Greetings,
    I would want generate a dtable that displays the proportion and it's 95% Confidence Intervals of the prorpotions for each categorical variable:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long marital_status2 byte age_first_sex long gender
    3 17 1
    2 17 1
    1 16 1
    2 16 2
    2 18 2
    2 19 1
    2 14 2
    2 18 2
    2 20 2
    2 17 1
    end
    label values marital_status2 marital_status2
    label def marital_status2 1 "Never Married", modify
    label def marital_status2 2 "Married/Living with partner", modify
    label def marital_status2 3 "Divorced/Widow", modify
    label values gender gender_list
    label def gender_list 1 "male", modify
    label def gender_list 2 "female", modify
    Many Thanks
    Robert

  • #2
    I assume that you mean you want a 95% CI of the proportions for each value of your categorical variable; while simple enough for a binary variable (e.g., gender), this is not straightforward for your marital status variable which has 3 categories for which the proportions must sum to 1 (at least, I assume they must); first you need to tell us how you want to form the CIs; a good starting place might be chapter 9 ("Methods for triads of proportions") in Newcombe, RG (2013), Confidence intervals for proportions and related measures of effect size, CRC Press; note that even for the binary variable, I'm not sure this can be done via dtable though it can certainly be done via collect and table

    Comment


    • #3
      Thank you Rich Goldstein for the reply.
      Can you please assist me with how to do it via collect and table?

      Comment


      • #4
        What Rich said is that there are multiple ways (all wrong is some way) of creating such confidence intervals. So you first need to make a choice on what confidence interval you want. Only after that choice has been made, can we talk about implementing it.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Maarten Buis My sincere apologies.
          I'm yet to get access to the book Rich recommended.
          Let me try and clarify with a more detailed example to better understand what Rich replied. Cause I may have misunderstood:


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float period long maritalstatus2 int age_group long gender
          0 2 3 1
          0 1 6 1
          0 0 1 1
          0 1 5 2
          0 1 3 2
          end
          label values period lbl_period
          label def lbl_period 0 "Baseline", modify
          label values maritalstatus2 lbl_marital
          label def lbl_marital 0 "Never Married", modify
          label def lbl_marital 1 "Married/Living with partner", modify
          label def lbl_marital 2 "Divorced/Widow", modify
          label values age_group age_group1
          label def age_group1 1 "15-19", modify
          label def age_group1 3 "25-29", modify
          label def age_group1 5 "35-39", modify
          label def age_group1 6 "40-54", modify
          label values gender gender_list
          label def gender_list 1 "Male", modify
          label def gender_list 2 "Female", modify
          I wanted to know if it's possible now with the collect command to produce something in this format with the proportions and 95% CIs 95% CI of the proportions for each value of your categorical variable in parenthesis:
          Baseline Midline Endline Total
          N 2,596 (30.8%, xx%) 2,943 (34.9%, xx%) 2,902 (34.4%, xx%) 8,441 (100.0%, xx%)
          Marital Status
          Never Married 546 (21.0%, xx%) 618 (21.0%, xx%) 570 (19.6%, xx%) 1,734 (20.5%, xx%)
          Married/Living 1,854 (71.4%, xx%) 2,060 (70.0%, xx%) 2,103 (72.5%, xx%) 6,017 (71.3%, xx%)
          Divorced/Widow 196 (7.6%, xx%) 265 (9.0%, xx%) 229 (7.9%, xx%) 690 (8.2%, xx%)
          Age Group
          15-19 410 (15.8%, xx%) 459 (15.6%, xx%) 394 (13.6%, xx%) 1,263 (15.0%, xx%)
          20-24 510 (19.6%, xx%) 594 (20.2%, xx%) 562 (19.4%, xx%) 1,666 (19.7%, xx%)
          25-29 516 (19.9%, xx%) 523 (17.8%, xx%) 556 (19.2%, xx%) 1,595 (18.9%, xx%)
          Sex
          Male 1,250 (48.2%, xx%) 1,416 (48.1%, xx%) 1,403 (48.3%, xx%) 4,069 (48.2%, xx%)
          Female 1,346 (51.8%, xx%) 1,527 (51.9%, xx%) 1,499 (51.7%, xx%) 4,372 (51.8%, xx%)

          Thanks
          Robert

          Comment


          • #6
            We tend to think of "the confidence interval", as if there is just one confidence interval. So then you get a question like yours: hey can I show "the confidence interval" for this set of mutually exclusive proportions. The problem with that is that that assumes that there is one definition for confidence intervals for a set of mutually exclusive proportions. Unfortunately that is not the case. So what you thought was just a display problem or a table problem, is actually a deep statistical problem. The thing you want to display is not defined yet by your question, and it is impossible to display undefined things. So you first need to figure out what confidence interval you want before you can actually display it.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              I agree with Maarten Buis and note that this also applies to binary categorical varaibles such as "sex" in #5 above (I had assumed above that you would want to show just one of the sexes and the problem does not arise there); I am not sure whether the following will be of help (as you haven't said why you want these), but you might want to look at Agresti, A, et al. (2008), "Simultaneous confidence intervals for comparing binomial parameters," Biometrics, 64: 1270-1275

              Comment


              • #8
                Thanks a lot. I now understand.
                I thought I could compare the prevalence for each characteristic across the 3 periods and see whether there's any difference (checking to see if CIs overlap) rather than use the chi-square p-value.
                I think I can employ some regression models instead with my baseline as the reference.
                Thanks again

                Comment


                • #9
                  Originally posted by robert seru View Post
                  I thought I could compare the prevalence for each characteristic across the 3 periods and see whether there's any difference (checking to see if CIs overlap)
                  So on top of the problem that the confidence intervals are not well defined in your case, you have the problem that checking if confidence intervals overlap is not the same as a test for equality. See for instance this article by Andrew Gelman and Hal Stern in the American Statistician: https://doi.org/10.1198/000313006X152649
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    to expand on this a bit, note that while it is true that non-overlapping CI's do mean a p-value smaller than 1-CI_level (e.g., below .05 if it is a 95% CI), it is NOT true that overlapping CI's mean a p-value higher than 1-CI_level; here are two cites where the first is aimed at non-statisticians and the second includes theory on how much overlap can be expected:

                    Wolfe, R and Hanley, J (2002), "If we’re so different, why do we keep overlapping? When 1 plus 1 doesn’t make 2", Canadian Medical Assoc Journal, 166(1): 65-66

                    Schenker, N and Gentleman, JF (2001), "On Judging the Signiécance of Differences by Examining the Overlap Between Conédence Intervals", The American Statistician, 55(3): 182-186, DOI: 10.1198/000313001317097960



                    Comment

                    Working...
                    X