two way within-subject Anova for Centile results on categorical data

Mitchell Berman

Join Date: Apr 2014

Posts: 28
#1

two way within-subject Anova for Centile results on categorical data

26 Mar 2016, 14:13

Statalisters-

I'd appreciate advice on the following data.

We are studying drug dosages used in two types of operations, using data taken from 30 different hospitals.
Operation 1 generally requires more drug than operation 2, and the distribution of drug used is very skewed. Thus we chose to use median, 80th percentile, and other centile results to describe the distribution for drug used at each hospital for each operation.

Hospitals have different cultures; some use lots of drug, some less drug, but in general, operation 1 requires more drug than operation 2, though the "low-drug" operation at one hospital may be higher than the high-drug operation at another hospital. It's all relative to local culture.

Initially I performed a paired t-test at each centile, which compared, for example, the 80th percentile drug dose for operation 1 against the same percentile drug dose for operation 2, which basically looked at the difference at each hospital, calculated the mean and SD of the difference across all hospitals and showed that it was not 0. All of the centile comparisons were highly significant, but we had 6 centiles, thus 6 paired t-tests and I needed to reduce the SDs and p values for multiple comparisons. Bonferroni seems overly conservative since the centile results are related.

I thought that my paired t-test approach was theoretically the same as a one-level within-subject Anova, limiting the anova to one centile at a time. Wonder if the different centiles are a form of Repeated Measure.
Thus, how can I use a 2-way within-subject anova (my guess at the structure) to comment about the overall difference between Operation 1 and Operation 2 (step 1), then perform a post-hoc analysis to look at individual centiles. I don't care about the differences between hospitals for this Operation1 vs Operation2 analysis.

I have performed kruskal Wallis comparing Operation 1 to Operation 2 over all the primary data not separated by hospital before generating centile data, but this seems to miss the extra information contained by knowing the difference at each hospital. (There are other reasons, too, that I need to perform the analysis on a per-hospital basis)

Demonstration data is below-- can't include the real data. I specifically made the 90th percentile data not different Operation1 vs Operation2, to see if I can detect that.
Sorry about the length of the question.

Thanks much.
Mitchell Berman
Columbia University

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte inst float operation byte centiles int dose 1 1 50 30 1 2 50 28 1 1 80 45 1 2 80 46 1 1 90 55 1 2 90 55 1 1 95 76 1 2 95 70 2 1 50 60 2 2 50 40 2 1 80 90 2 2 80 91 2 1 90 120 2 2 90 120 2 1 95 160 2 2 95 155 3 1 50 15 3 2 50 15 3 1 80 20 3 2 80 21 3 1 90 30 3 2 90 36 3 1 95 38 3 2 95 36 4 1 50 50 4 2 50 45 4 1 80 60 4 2 80 55 4 1 90 63 4 2 90 67 4 1 95 85 4 2 95 84 end
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

26 Mar 2016, 19:15

You might want to take a look at a couple of user-written commands that can do quantile regression with panel data. There was an announcement on the list last week about them.

If you're just interested in determining whether there is a difference in distributions between operations irrespective of hospital, then you can do a nonparametric test that stratifies on hospital by using another user-written command, emh, that can also be downloaded from SSC.

If it's just skew that bothers you when contemplating a parametric model, then you can try meglm with a log link with Gaussian or gamma distribution family.

I'm not sure why you need null hypothesis significance testing in order to comment on the overall difference between operations. Couldn't you just plot the hospital medians (or representative quantiles) for the two operative procedures and comment on what is evident?
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#3

27 Mar 2016, 08:33

Thank you for the comments. The journal that this is being readied for prefers statistical measures, even for relatively obvious results. And applying Bonferroni 21 times removes significance from things that are obviously real.
I'm trying to understand how to use a two-way anova on this.
What would the syntax of that command be? Is is a within-subjects design?

In the mean time, I will explore the emh command.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

27 Mar 2016, 20:16

With emh, you would do something like

Code:

emh dose operation, strata(inst) anova transformation(modridit)

Not to advocate it, but I'm guessing that you're looking for something like

Code:

mixed dose i.operation##i.centile || inst: , reml nolrtest dfmethod(anova) nolog

or perhaps

Code:

xtreg dose i.operation##i.centile, i(inst) fe
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#5

27 Mar 2016, 20:34

Joseph--

Thanks so much for this. Going to try these over the next few days.
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#6

29 Mar 2016, 08:13

Joseph- I'm having problems with matsize in emh (Stata SE 13.1)

The data are as I described above.
I have 17 institutions (the strata)
3 types of operations (the repeated measure)--
doses-- there are a large number of doses per institutions per repeated measure (type of operation)-- 300k doses for 1 of the 3 types of operations for one institution

I can perform kwallis on all the institutions at once (matsize 400), but I've cut back to just three institutions and emh still says matsize too small (now with matsize=1600).
Am I doing something wrong?
I don't think the problem is that you are expecting only one value per repeated measurement, per subject (institution). emh works on test data I created that has different size groups.
I have thousands of doses for each institutions for each of the two types of operations.

I realize you are the author of this module-- thanks so much for helping me through this.
Going to try to run this in SPSS-- see if a similar issue occurs.

PS-- I downloaded some test data from Laerd Statistics, the online support site for SPSS. Ran Friedmans test in SPSS. Converted the data from SPSS wide format to Stata long format.
Ran the same data in emh as you had described-- emh provided exactly the same results as Friedman test in SPSS. So I think I know the appropriate data structure and command for emh.

Last edited by Mitchell Berman; 29 Mar 2016, 08:52.
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#7

29 Mar 2016, 11:04

Joseph--

The data in my original dataset is of the abbreviated form below.
Each institution has thousands of datapoints (doses) for each of the two operations.

emh works on the demo data I've provided below, fails on the real data, asking for a larger matsize each time it is attempted.
Is there some other program that will work for non-parametric data that is stratified?
trying somersd right now-- but I imagine there is a built in program.

Thank you again.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(institution operation dose) 1 1 80 1 2 70 1 1 82 1 2 80 1 2 74 1 2 72 2 2 200 2 2 210 2 1 410 2 1 400 3 1 30 3 2 12 3 2 13 3 2 12 3 2 14 3 2 12 3 1 33 3 1 32 3 1 30 end
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#8

29 Mar 2016, 13:20

The help file for emh does say that it relies on tabulate and that there can be situations where the matrix size is not adequate. In your case, if there are many patients receiving the same dose, then you can gain efficiencies by using frequency weights. Try

Code:

contract institution operation dose, frequency(count) set matsize 11000 emh dose operation [fweight=count], strata(institution) anova transformation(modridit)

and see whether that helps.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#9

29 Mar 2016, 21:14

That's

Code:

contract institution operation dose, freq(count)
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#10

31 Mar 2016, 13:29

Joesph-- When I run this now, it runs a bit longer, but stops with the error "too many values"

Do you think this is a a limitation of Stata? I noticed you referenced SAS in the help file-- do you think this is a limitation intrinsic to Extended Mantel-Haenszel calculations, or might this work in SAS?
Any other non-parametric, stratified tests I might consider? Tried the van Elteren test in Stata (after limiting the grouping variable to 2 levels), but it also errors, saying there is only "1 group found" when there are definitely 2
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#11

01 Apr 2016, 05:16

The error message sounds like it's coming from official Stata's tabulate (one-way) command, which emh calls internally in order to tabulate unique values of the response variable (dose level in your case). tabulate has a limit of 12,000 rows for a one-way table.

You mentioned having over 300,000 doses (patients, operations) for one of the types of operation, but it seems odd that you'd have more than 12,000 unique levels (unique values) for a dose of a drug, even for cumulative dose over the perioperative period. I don't know about your dataset and so I cannot comment further than that.

The limit is not inherent in the method, and so it might very well work in SAS. If I were to write emh today, I would probably write it in Mata, and there wouldn't be the limit imposed by Stata matrices and by tabulate, although the later is written in C and so it has a speed advantage over Mata.

For the van Elteren test, you will need to code the grouping variable as 0 and 1. If the two groups are coded, for example, as 1 and 2, then the command will give the error message that you recieved. So, for your three types of operation:

Code:

generate byte operation12 = operation == 2 if inlist(operation, 1, 2) vanelteren dose, by(operation12) strata(institution) generate byte operation13 = operation == 3 if inlist(operation, 1, 3) vanelteren dose, by(operation13) strata(institution) generate byte operation23 = operation == 3 if inlist(operation, 2, 3) vanelteren dose, by(operation23) strata(institution)

Last edited by Joseph Coveney; 01 Apr 2016, 05:22.
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#12

01 Apr 2016, 07:17

Joseph-- your are correct about the assumption that there should be far fewer possible dose values for a drug, that is given (for example) between settings 1 and 400 mcg/kg/h. Done as a frequency there should be about 400 individual dose settings. Problem is that some institutions use pounds instead of kg, and others record their values as cc/h. When these doses are translated to the std units of mcg/kg/h, I get fractional doses between the integers. Given your insights I'm going to round my raw data to integer values, then convert to frequencies again.
Will also try SAS, perhaps, but I think you've identified the main issue. I'll follow up with a reply. Thank you so much.
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#13

01 Apr 2016, 08:14

Your insight was the key. Rounding the data prior to converting to frequencies allowed emh to complete within a few seconds.

I've seen posts by you in the Stata list discussing emh suggesting the use of both rank and modridit and I ran both. Is it correct that rank and modridit are identical for emh if no strata are identified, but that modridit is the required transformation for stratified data?
I have similar stratified data with groups spanning three and four levels. I'm not interested in differences between strata. Would you suggest emh for post-hoc testing between specific pairs of the grouping variable, once I've initially identified non-uniformity over the multiple groups?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#14

01 Apr 2016, 19:04

Without strata, the two transformations will give the test statistic of the Kruskal-Wallis test. With strata (and two groups), the standardized midrank transformation ("modified ridit") will give the test statistic of the van Elteren test.
Comment
Mitchell Berman

Join Date: Apr 2014

Posts: 28
#15

02 Apr 2016, 09:07

Thanks so much for working me through this process. Lots of good information.
Comment

Announcement

two way within-subject Anova for Centile results on categorical data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment