Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count "at least one"

    Hello,

    I apologise for what I'm sure is a trivial question but I've been searching through the forums for an answer to no avail.

    I have data from a cluster randomised trial in 3 arms, with 10 clusters in each arm. Each cluster contains 20 households and there is data on how many windows a house has and whether that window has a blind (a binary outcome, coded 0 or 1). I have missing data for 5 households out of the 600. Each entry in the dataset is for a window, identified by its household number (1-20), cluster number (1-30), arm number (1-3) and outcome (0 or 1), giving a total of 3162 entries, but there is no unique household identifier.

    I'm trying to work out the proportion of households in each arm in which at least one of the windows has a blind.

    As far as I can tell, I have to do two things:

    (1) Generate a unique identifier for each household, which will usually have multiple entries because most houses have multiple windows.

    The code I've come up with (which seems to work) is:
    Code:
    egen household_id = group(cluster_number household_number)
    (2) Write some code which effectively says "if any of the entries under a particular household_id has a value of 1 for the outcome, count that household. If not, don't." And separate them by the three arms.

    But I'm really stuck on how to do part (2).


    Any assistance would be greatly appreciated.

    Jacob

  • #2
    See https://www.stata.com/support/faqs/d...ble-recording/ Something like this should help.

    Code:
    egen wanted = max(window), by(household_id) 
    
    egen tag = tag(household_id) 
    
    tab cluster_number wanted if tag

    Comment


    • #3
      Dear Nick,

      Thanks for your reply, which has worked everything out nicely. I'd looked at that webpage but hadn't worked out the tag section of code.

      Best wishes,

      Jacob

      Comment


      • #4
        Hello,

        Similar to yesterday's question, I'm now trying to work out the number of blinds per 100 households, by arm and overall, and the 95% confidence intervals of these means.

        I've been through these forums over and over and just can't work out what to type (or even the name of the process I'm looking for). I'm new to STATA so please forgive my ignorance.

        Best wishes,
        Jacob

        Comment


        • #5
          Please study FAQ Advice #12 on how best to help us -- by giving a data example. No need for repeated apologies -- we were all beginners once -- but a real need for concrete examples.

          https://www.statalist.org/forums/help#version

          While you're there swing by

          https://www.statalist.org/forums/help#spelling

          If I understand you correctly, your first need is a dataset based on households, which in terms of the variables in #2 would be got with

          Code:
          keep if tag
          This invented dataset shows some technique -- and some personal choices (e.g. I like the option jeffreys in ci proportions).

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(window household_id cluster)
          1  1 1
          1  2 1
          0  3 1
          0  4 2
          0  5 2
          1  6 2
          0  7 3
          0  8 3
          0  9 3
          1 10 4
          1 11 4
          1 12 4
          end
          
          statsby, by(cluster) total clear : ci proportions window, jeffreys 
          
          su cluster, meanonly 
          local totalid = r(max) + 1 
          replace cluster = `totalid' if cluster == . 
          label def cluster `totalid' "Total"
          label val cluster cluster 
           
          twoway scatter proportion cluster, mc(blue) || rcap lb ub cluster, lc(blue) ytitle(Mean proportions and 95% confidence intervals) scheme(s1color) title(Households with windows)  legend(off) yla(0 "0" 1 "1" 0.2(0.2)0.8, ang(h) format(%02.1f)) xla(, valuelabel)
          You can copy and paste this code into a do-file editor window and run it all at once.

          Comment


          • #6
            Dear Nick,

            Thank you for the reply. Points noted about Stata vs STATA, repeated apologies and giving data examples. Particularly the point about giving data examples because what you've kindly shown me how to do is not quite what I'm trying to do, which I'll now try and explain better.

            I can't use
            Code:
            keep if tag
            because what I'm trying to do is calculate the mean number of blinds per 100 households (an index I'm interested in). It's a mean, not a proportion, and it will be above 100. And because of the way the data is arranged, if I exclude all non-tag data I'll be limiting myself to a maximum of 100 for the index.

            What I've done so far is this:
            Code:
            . egen household_id = group(cluster_number household_number)
            
            . egen tag = tag(household_id) 
            
            . tab window arm if window==1
            
                       |               arm
                window |         1          2          3 |     Total
            -----------+---------------------------------+----------
                     1 |       276        210        181 |       667 
            -----------+---------------------------------+----------
                 Total |       276        210        181 |       667 
            
            . tab arm if tag
            
                    arm |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |        199       33.45       33.45
                      2 |        198       33.28       66.72
                      3 |        198       33.28      100.00
            ------------+-----------------------------------
                  Total |        595      100.00
            What I want to do now is divide the number of blinds by the number of households in each arm (and overall) then multiply them by 100. So (276/199)*100 and so on. And then generate a 95% confidence interval for the index for each arm, and also have the index in a form such that I can do negative binomial regression along the lines of
            Code:
            nbreg newindex i.arm
            afterwards. I hope this is a lot clearer and thanks for giving me tips on how to use the forum correctly.

            Best wishes,
            Jacob

            Comment


            • #7

              You did say that: my fault. Other way round, still no data example! My latest guess is

              Code:
              egen total = total(window), by(household_id)  
              egen tag = tag(household_id) 
              keep if tag  
              statsby, by(cluster) total clear : ci mean total
              and then it's the same code as before. I stopped at trying to understand your last request, given other things to do.

              Comment

              Working...
              X