Hello,
I have a multicentre dataset, spaning 14 (non-consecutive) years, of patients who had surgery with device A vs device B. Some years I don't have data for some of the centres (they didn't participate in the study). One of my variables indicates whether they were inserted device A (variable canti). I have generated a new variable that expresses the percentage of device A implemented per centre and year (variable cantiper), ie, it has a constant value each centre and year. Since the centres participating changed annually, the number of observations for cantiperc varies by year.
I would like to calculate the annual median of the percentages of each centre. For this, I'm using the function egen (see below). The issue is that the median values I get often differ from the median values I obtain when calculated manually. When I have an even number of centres, the median I get varies: some times it's the average of the two mid (sorted) values, some times it's one of the two mid values, and some times it's the value above or below the two mid values. When I have an odd number of centres, the median I get varies too: some times it is the actual mid value, but some times it is the value above or below the mid value.
Below, I am copying an example of my dataset (including only relevant variables) a description of the variables, and the commands/functions I use. In this example, the median for the year 2004 (in the group: ageg=1 and revis=0) is 33.3%. However, I'd want it to be the average of the two mid observations (there was an even number of observations that year), which are 33.3% and 0, ie the median would be 16.7%.
Can you advice please? I hope I have phrased the question correctly, but please let me know if not or if you need further info.
Many thanks!
Rocio
I have a multicentre dataset, spaning 14 (non-consecutive) years, of patients who had surgery with device A vs device B. Some years I don't have data for some of the centres (they didn't participate in the study). One of my variables indicates whether they were inserted device A (variable canti). I have generated a new variable that expresses the percentage of device A implemented per centre and year (variable cantiper), ie, it has a constant value each centre and year. Since the centres participating changed annually, the number of observations for cantiperc varies by year.
I would like to calculate the annual median of the percentages of each centre. For this, I'm using the function egen (see below). The issue is that the median values I get often differ from the median values I obtain when calculated manually. When I have an even number of centres, the median I get varies: some times it's the average of the two mid (sorted) values, some times it's one of the two mid values, and some times it's the value above or below the two mid values. When I have an odd number of centres, the median I get varies too: some times it is the actual mid value, but some times it is the value above or below the mid value.
Below, I am copying an example of my dataset (including only relevant variables) a description of the variables, and the commands/functions I use. In this example, the median for the year 2004 (in the group: ageg=1 and revis=0) is 33.3%. However, I'd want it to be the average of the two mid observations (there was an even number of observations that year), which are 33.3% and 0, ie the median would be 16.7%.
Can you advice please? I hope I have phrased the question correctly, but please let me know if not or if you need further info.
Many thanks!
Rocio
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(id year revis ageg centranon canti) 1948 2007 0 2 9 1 1979 2013 0 2 40 0 2063 2004 0 2 11 0 2063 2009 1 2 11 1 2064 2004 0 1 2 1 2064 2006 1 1 2 1 2072 2004 0 1 2 1 2117 2004 0 1 10 0 2117 2004 1 1 10 1 2140 2004 0 2 21 0 2140 2004 1 2 21 0 2141 2004 0 2 16 0 2142 2004 0 2 13 0 2143 2004 0 2 13 0 2147 2004 0 2 5 0 2148 2004 0 2 16 0 2149 2004 0 1 2 1 2149 2011 1 2 21 1 2154 2004 0 1 6 0 2157 2004 0 2 7 0 2164 2004 0 2 5 0 2167 2004 0 2 11 0 2169 2004 0 2 5 0 2171 2004 0 2 5 0 2172 2004 0 2 1 0 2173 2004 0 1 38 0 2174 2004 0 2 27 0 2175 2004 0 2 35 0 2176 2004 0 2 1 0 2177 2004 0 2 1 0 2178 2004 0 2 35 0 2179 2004 0 2 35 0 2179 2011 1 2 35 1 2180 2004 0 2 24 0 2184 2004 0 2 23 0 2185 2004 0 2 24 0 2186 2004 0 2 24 0 2187 2004 0 2 23 0 2188 2004 0 2 23 0 2189 2004 0 2 5 1 2190 2004 0 2 1 0 2192 2004 0 2 1 0 2194 2004 0 2 28 0 2201 2004 0 1 5 0 2202 2004 0 1 5 1 2203 2004 0 2 23 0 2207 2004 0 2 40 0 2208 2004 0 2 10 0 2209 2004 0 2 35 0 2210 2004 0 2 17 0 2211 2004 0 2 23 0 2212 2004 0 2 40 0 2218 2004 0 2 27 0 2219 2004 0 1 1 0 2230 2004 0 2 30 1 2239 2004 0 1 30 1 2242 2004 0 2 35 0 2242 2006 1 2 35 0 2244 2004 0 1 30 1 2246 2004 0 2 30 1 2247 2004 0 2 30 1 2248 2004 0 2 30 1 2249 2004 0 2 30 1 2250 2004 0 2 16 0 2251 2004 0 2 16 0 2252 2004 0 2 16 0 2253 2004 0 2 21 1 2258 2004 0 2 21 0 2264 2004 0 1 5 0 2266 2004 0 2 11 0 2267 2004 0 1 1 0 2277 2004 0 2 24 0 2279 2004 0 2 1 0 2282 2004 0 2 1 0 2284 2004 0 1 38 0 2284 2013 1 1 38 1 2286 2004 0 2 5 0 2307 2004 0 2 11 0 2309 2004 0 2 11 0 2309 2004 1 2 11 0 2310 2004 0 2 11 0 2313 2004 0 2 11 0 2321 2004 0 1 2 1 2322 2004 0 2 23 0 2323 2004 0 1 38 0 2324 2004 0 2 11 0 2325 2004 0 2 24 0 2325 2004 1 2 24 0 2326 2004 0 2 24 0 2327 2004 0 2 24 0 2329 2004 0 2 1 0 2330 2004 0 2 1 0 2340 2004 0 2 7 1 2340 2004 1 2 7 1 2346 2004 0 2 9 0 2350 2004 0 1 14 0 2356 2004 0 2 14 0 2360 2004 0 1 2 0 2361 2004 0 2 11 0 2366 2004 0 1 10 1 end label values revis revis label def revis 0 "Creation", modify label def revis 1 "Revision", modify label values ageg ageg label def ageg 1 "Children <17", modify label def ageg 2 "Adults >=17", modify label values centranon centranon label def centranon 1 "Centre-1", modify label def centranon 2 "Centre-2", modify label def centranon 5 "Centre-5", modify label def centranon 6 "Centre-6", modify label def centranon 7 "Centre-7", modify label def centranon 9 "Centre-9", modify label def centranon 10 "Centre-10", modify label def centranon 11 "Centre-11", modify label def centranon 13 "Centre-13", modify label def centranon 14 "Centre-14", modify label def centranon 16 "Centre-16", modify label def centranon 17 "Centre-17", modify label def centranon 21 "Centre-21", modify label def centranon 23 "Centre-23", modify label def centranon 24 "Centre-24", modify label def centranon 27 "Centre-27", modify label def centranon 28 "Centre-28", modify label def centranon 30 "Centre-30", modify label def centranon 35 "Centre-35", modify label def centranon 38 "Centre-38", modify label def centranon 40 "Centre-40", modify label values canti yesnola label def yesnola 0 "No", modify label def yesnola 1 "Yes", modify label var id "Patient id merged dataset" label var year "Procedure year" label var revis "Procedure type" label var ageg "Age group" label var centranon "Centre" label var canti "Device A yesno"
Code:
by centranon year revis ageg, sort: egen cantiperc = mean(100 * canti) la var cantiperc "Percentage of device A per centre, year, age, group and procedure type" sort year revis ageg egen median = median(cantiperc), by(year ageg revis) la var median "Median of the percentages of device A in each centre per year, age group and procedure type" bysort revis year: tab centranon canti if ageg==1, row tab year median if ageg==1 & revis==0
Comment