Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Incorrect(?) median with egen median

    Hello,

    I have a multicentre dataset, spaning 14 (non-consecutive) years, of patients who had surgery with device A vs device B. Some years I don't have data for some of the centres (they didn't participate in the study). One of my variables indicates whether they were inserted device A (variable canti). I have generated a new variable that expresses the percentage of device A implemented per centre and year (variable cantiper), ie, it has a constant value each centre and year. Since the centres participating changed annually, the number of observations for cantiperc varies by year.

    I would like to calculate the annual median of the percentages of each centre. For this, I'm using the function egen (see below). The issue is that the median values I get often differ from the median values I obtain when calculated manually. When I have an even number of centres, the median I get varies: some times it's the average of the two mid (sorted) values, some times it's one of the two mid values, and some times it's the value above or below the two mid values. When I have an odd number of centres, the median I get varies too: some times it is the actual mid value, but some times it is the value above or below the mid value.

    Below, I am copying an example of my dataset (including only relevant variables) a description of the variables, and the commands/functions I use. In this example, the median for the year 2004 (in the group: ageg=1 and revis=0) is 33.3%. However, I'd want it to be the average of the two mid observations (there was an even number of observations that year), which are 33.3% and 0, ie the median would be 16.7%.

    Can you advice please? I hope I have phrased the question correctly, but please let me know if not or if you need further info.

    Many thanks!

    Rocio


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id year revis ageg centranon canti)
    1948 2007 0 2  9 1
    1979 2013 0 2 40 0
    2063 2004 0 2 11 0
    2063 2009 1 2 11 1
    2064 2004 0 1  2 1
    2064 2006 1 1  2 1
    2072 2004 0 1  2 1
    2117 2004 0 1 10 0
    2117 2004 1 1 10 1
    2140 2004 0 2 21 0
    2140 2004 1 2 21 0
    2141 2004 0 2 16 0
    2142 2004 0 2 13 0
    2143 2004 0 2 13 0
    2147 2004 0 2  5 0
    2148 2004 0 2 16 0
    2149 2004 0 1  2 1
    2149 2011 1 2 21 1
    2154 2004 0 1  6 0
    2157 2004 0 2  7 0
    2164 2004 0 2  5 0
    2167 2004 0 2 11 0
    2169 2004 0 2  5 0
    2171 2004 0 2  5 0
    2172 2004 0 2  1 0
    2173 2004 0 1 38 0
    2174 2004 0 2 27 0
    2175 2004 0 2 35 0
    2176 2004 0 2  1 0
    2177 2004 0 2  1 0
    2178 2004 0 2 35 0
    2179 2004 0 2 35 0
    2179 2011 1 2 35 1
    2180 2004 0 2 24 0
    2184 2004 0 2 23 0
    2185 2004 0 2 24 0
    2186 2004 0 2 24 0
    2187 2004 0 2 23 0
    2188 2004 0 2 23 0
    2189 2004 0 2  5 1
    2190 2004 0 2  1 0
    2192 2004 0 2  1 0
    2194 2004 0 2 28 0
    2201 2004 0 1  5 0
    2202 2004 0 1  5 1
    2203 2004 0 2 23 0
    2207 2004 0 2 40 0
    2208 2004 0 2 10 0
    2209 2004 0 2 35 0
    2210 2004 0 2 17 0
    2211 2004 0 2 23 0
    2212 2004 0 2 40 0
    2218 2004 0 2 27 0
    2219 2004 0 1  1 0
    2230 2004 0 2 30 1
    2239 2004 0 1 30 1
    2242 2004 0 2 35 0
    2242 2006 1 2 35 0
    2244 2004 0 1 30 1
    2246 2004 0 2 30 1
    2247 2004 0 2 30 1
    2248 2004 0 2 30 1
    2249 2004 0 2 30 1
    2250 2004 0 2 16 0
    2251 2004 0 2 16 0
    2252 2004 0 2 16 0
    2253 2004 0 2 21 1
    2258 2004 0 2 21 0
    2264 2004 0 1  5 0
    2266 2004 0 2 11 0
    2267 2004 0 1  1 0
    2277 2004 0 2 24 0
    2279 2004 0 2  1 0
    2282 2004 0 2  1 0
    2284 2004 0 1 38 0
    2284 2013 1 1 38 1
    2286 2004 0 2  5 0
    2307 2004 0 2 11 0
    2309 2004 0 2 11 0
    2309 2004 1 2 11 0
    2310 2004 0 2 11 0
    2313 2004 0 2 11 0
    2321 2004 0 1  2 1
    2322 2004 0 2 23 0
    2323 2004 0 1 38 0
    2324 2004 0 2 11 0
    2325 2004 0 2 24 0
    2325 2004 1 2 24 0
    2326 2004 0 2 24 0
    2327 2004 0 2 24 0
    2329 2004 0 2  1 0
    2330 2004 0 2  1 0
    2340 2004 0 2  7 1
    2340 2004 1 2  7 1
    2346 2004 0 2  9 0
    2350 2004 0 1 14 0
    2356 2004 0 2 14 0
    2360 2004 0 1  2 0
    2361 2004 0 2 11 0
    2366 2004 0 1 10 1
    end
    label values revis revis
    label def revis 0 "Creation", modify
    label def revis 1 "Revision", modify
    label values ageg ageg
    label def ageg 1 "Children <17", modify
    label def ageg 2 "Adults >=17", modify
    label values centranon centranon
    label def centranon 1 "Centre-1", modify
    label def centranon 2 "Centre-2", modify
    label def centranon 5 "Centre-5", modify
    label def centranon 6 "Centre-6", modify
    label def centranon 7 "Centre-7", modify
    label def centranon 9 "Centre-9", modify
    label def centranon 10 "Centre-10", modify
    label def centranon 11 "Centre-11", modify
    label def centranon 13 "Centre-13", modify
    label def centranon 14 "Centre-14", modify
    label def centranon 16 "Centre-16", modify
    label def centranon 17 "Centre-17", modify
    label def centranon 21 "Centre-21", modify
    label def centranon 23 "Centre-23", modify
    label def centranon 24 "Centre-24", modify
    label def centranon 27 "Centre-27", modify
    label def centranon 28 "Centre-28", modify
    label def centranon 30 "Centre-30", modify
    label def centranon 35 "Centre-35", modify
    label def centranon 38 "Centre-38", modify
    label def centranon 40 "Centre-40", modify
    label values canti yesnola
    label def yesnola 0 "No", modify
    label def yesnola 1 "Yes", modify
    label var id "Patient id merged dataset"
    label var year "Procedure year"
    label var revis "Procedure type"
    label var ageg "Age group"
    label var centranon "Centre"
    label var canti "Device A yesno"
    Code:
    by centranon year revis ageg, sort: egen cantiperc = mean(100 * canti)
    la var cantiperc "Percentage of device A per centre, year, age, group and procedure type"
    sort year revis ageg
    egen median = median(cantiperc), by(year ageg revis)
    la var median "Median of the percentages of device A in each centre per year, age group and procedure type"
    bysort revis year: tab centranon canti if ageg==1, row
    tab year median if ageg==1 & revis==0
    Last edited by Rocio Fernandez Mendez; 17 Mar 2022, 10:51.

  • #2
    From your sample data and code, I then ran the following.
    Code:
    . sort year ageg revis cantiperc
    
    . list year ageg revis cantiperc median if ageg==1, clean noobs nolabel
    
        year   ageg   revis   cantip~c     median  
        2004      1       0          0   33.33333  
        2004      1       0          0   33.33333  
        2004      1       0          0   33.33333  
        2004      1       0          0   33.33333  
        2004      1       0          0   33.33333  
        2004      1       0          0   33.33333  
        2004      1       0          0   33.33333  
        2004      1       0   33.33333   33.33333  
        2004      1       0   33.33333   33.33333  
        2004      1       0   33.33333   33.33333  
        2004      1       0         50   33.33333  
        2004      1       0         50   33.33333  
        2004      1       0         80   33.33333  
        2004      1       0         80   33.33333  
        2004      1       0         80   33.33333  
        2004      1       0         80   33.33333  
        2004      1       0         80   33.33333  
        2004      1       0        100   33.33333  
        2004      1       0        100   33.33333  
        2004      1       1        100        100  
        2006      1       1        100        100  
        2013      1       1        100        100
    So of these 22 observations, three of them were for revis==1 and should be ignored.

    On reflection I think perhaps you want to count each center only once in computing the median?

    Last edited by William Lisowski; 17 Mar 2022, 12:40.

    Comment


    • #3
      Here is an approach that reduces the number of observations to one per centre and calculates the median.
      Code:
      . collapse (mean) cantiperc=canti, by(centranon year revis ageg)
      
      . replace cantiperc = cantiperc*100
      (16 real changes made)
      
      . la var cantiperc "Percentage of device A per centre, year, age, group and procedure type"
      
      . egen median = median(cantiperc), by(year ageg revis)
      
      . sort year revis ageg cantiperc centranon
      
      . list if year==2004 & ageg==1 & revis==0, clean noobs nolabel abbreviate(20)
      
          year   revis   ageg   centranon   cantiperc     median  
          2004       0      1           1           0   16.66667  
          2004       0      1           6           0   16.66667  
          2004       0      1          14           0   16.66667  
          2004       0      1          38           0   16.66667  
          2004       0      1           5    33.33334   16.66667  
          2004       0      1          10          50   16.66667  
          2004       0      1           2          80   16.66667  
          2004       0      1          30         100   16.66667

      Comment


      • #4
        Hi William, it worked! Then I merged the collapsed dataset with the main dataset. Thank you for your help!

        Comment

        Working...
        X