Which statistics should I use when analyzing the effect of interventions on health outcomes?

Insung Kang

Join Date: Jan 2020

Posts: 17
#1

Which statistics should I use when analyzing the effect of interventions on health outcomes?

25 Jan 2022, 18:23

Dear Stata users,

I would like to ask which stats I should use. I have health outcomes data (continuous) for 2 years, 1-year before and after interventions (Three types of interventions to improve air quality). I don't have a control group. I used t-tests to see how air quality was improved.
What I've learned so far, I'm trying to use 1) intend to treatment or 2) mixed effect model using "xtreg" after reframing the dataset to panel dataset, to see the effectiveness of interventions on health outcomes. Or can I use "DID" analysis without a control group, just comparing each of three intervention groups?

Could you please give me any recommendations?
Many thanks.

Last edited by Insung Kang; 25 Jan 2022, 18:32.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

25 Jan 2022, 18:32

So far this is pretty nebulous because I can't see your data (use dataex as suggested by the FAQ) : when you say "interventions", let's talk about what specifically this means. Which interventions, done by to who or to what? Do we mean a public health policy? A cohort study or a quasi-experiment? On a related point to the above, what's our unit is analysis? The personal/individual level? A city? These have implications for the designs that're sensible or even possible here.

My personal take is to use synthetic controls or differences-in-differences. Not just because these are en vogue at the moment, but because they're powerful approaches to answering questions in this arena when executed correctly.

Oh, and hey Insung. Welcome to Statalist.

EDIT: You say there's no control group. However could you have panel data then? How could you impute a counterfactual using a fixed effects estimator if you've only one treated unit and no comparison group?

The very most you could do is a single group interrupted time series, because with only one unit (no control group), fixed effects estimators aren't possible.

Second edit: only one year of pre and post data really mean that regression won't do you much good here. You're pretty much only limited to t-tests, you need at least multiple units and time periods of data to call it panel data.

Last edited by Jared Greathouse; 25 Jan 2022, 18:41.
1 like
Comment

Insung Kang

Join Date: Jan 2020
Posts: 17

25 Jan 2022, 19:17

Hi Jared, thank you so much for your quick response. I've put an example of my dataex.

The study was >2-year, pseudo-randomized, crossover, intervention study. Unfortunately, we didn't have a control group: interventions are installing one of three types of ventilation systems in each home in the middle of the study.

Data I have annual asthma control test (ACT) score from monthly ACT scores, air quality data, housing characteristics, demographics, etc for each participant/home. I have examined the association between exposure to air quality and the level of asthma control using logistic regression. What I'd like to analyze or hypotheses are to see the impacts of three interventions on 1) air quality improvements and 2) asthma control test (ACT) score improvements, comparing the pre/post period, as well as 3) the different effects of three types of ventilation interventions.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 Participant str4 Home byte vent double(pre_ACT post_ACT) str6 Sex byte age str25 Race double(Y1_HCHO_IN Y2_HCHO_IN Y1_CO_IN Y1_CO_OU)
"P1"  "H111" 1 16.294117647058822 15.117647058823529 "Female" 64 "Hispanic or Latino"            7                 12 1.7050215559886053                  .
"P2"  "H138" 3 21.384615384615383 20.428571428571427 "Female" 54 "White"                        64                 32   4.35585146535928  2.320230976340442
"P3"  "H138" 3 13.923076923076923               17.4 "Male"   51 "White"                        64                 32   4.35585146535928  2.320230976340442
"P4"  "H167" 2 22.466666666666665              23.25 "Female" 37 "White"                      49.5               8.25  .0737434614509893                  .
"P5"  "H168" 2 20.857142857142858 21.555555555555557 "Female" 39 "White"                     26.25                  5 1.6537668532306131  .7981941802581125
"P6"  "H172" 2 23.307692307692307  24.11764705882353 "Female" 66 "White"                     10.75                  7  .0947636700648748 .08875925635490278
"P7"  "H177" 1                 21  22.23076923076923 "Female" 32 "White"                     88.75               27.5   4.02345437292436                  .
"P8"  "H179" 3                 19 19.529411764705884 "Female" 43 "White"                      17.5                  5  7.579476855932265 .18605262616455376
"P9"  "H181" 1 14.666666666666666                 19 "Female" 54 "Black or African American" 41.75                  5  6.878485446557014   2.63350499928314
"P10" "H182" 2 11.153846153846153 17.333333333333332 "Female" 46 "Black or African American" 15.25 11.333333333333334   .436388764215358                  .
end

Re: panel data, I was trying to make the dataset like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str4 Home byte(vent intervention index) double ACT
 1 "H111" 1 0 1 16.294117647058822
 1 "H111" 1 1 2 15.117647058823529
 2 "H138" 3 0 1 21.384615384615383
 2 "H138" 3 3 2 20.428571428571427
 3 "H138" 3 0 1 13.923076923076923
 3 "H138" 3 3 2               17.4
 4 "H167" 2 0 1 22.466666666666665
 4 "H167" 2 2 2              23.25
 5 "H168" 2 0 1 20.857142857142858
 5 "H168" 2 2 2 21.555555555555557
 6 "H172" 2 0 1 23.307692307692307
 6 "H172" 2 2 2  24.11764705882353
 7 "H177" 1 0 1                 21
 7 "H177" 1 1 2  22.23076923076923
 8 "H179" 3 0 1                 19
 8 "H179" 3 3 2 19.529411764705884
 9 "H181" 1 0 1 14.666666666666666
 9 "H181" 1 1 2                 19
10 "H182" 2 0 1 11.153846153846153
10 "H182" 2 2 2 17.333333333333332
end

Last edited by Insung Kang; 25 Jan 2022, 19:46.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

25 Jan 2022, 20:29

Well, Jared Greathouse has already pointed out the various aspects of your study design which limit the kind of analyses you can do and inferences you can make. I won't belabor those points. Rather I'll show you a few things you can do with this data.

First let me say how I understand the data you have shown--if I have misunderstood it, some of my advice may need revision. It appears that you have observations ( before reshaping ) that are uniquely identified by either Participant or Home. What is unclear is whether there can be more than one Participant in the same Home. That doesn't happen in your example data--but if it does happen in the real data set, then we have a new problem: there is a (partial) nesting of participants within home, so that a simple two-level model does not really reflect the sampling design. I'm going to be optimistic and assume that doesn't actually happen. Next, it looks like you have some pre-post data on ACT, and also on something called HCHO_IN (I'll guess that's a measurement of some air pollutant, not that it matters for present purposes what it is). And there is this variable vent, which takes on 3 values, and which I assume represents which intervention the person/house received. Everything else seems to be a constant attribute of the person (age, sex, race) or perhaps of the house (Y1_CO_IN and Y1_CO_OU).

Now the lack of a control group makes it impossible to identify any absolute effect of any intervention. But you can compare the effects of the three interventions to each other. Be careful, of course, in interpreting this. Even if you find that, say, 3 has the best effect on ACT, it may still be the case that all three interventions are worse than doing nothing!! All you can say with this design is how each of the interventions compares to the others.

The great strength of the design is that you have within-person data under different conditions. This reduces outcome variability considerably and makes your effect estimates more precise than if different people were tested at different times. So you want the analysis to reflect that within-person design. A paired t-test does that when there is only one intervention. But that does not have the ability to compare different interventions. So you need to use a fixed-effects regression, which is more general than a paired t-test. (If you were to apply fixed-effects regression in a situation where there is only one intervention with pre-post measurements, it would produce the same results as the paired t-test.)

Now, you can do the crude (unadjusted) comparison, or you can do an adjusted comparison. In your example data, the only adjustment you can make is for Y1_CO_IN or Y1_CO_OU. You cannot adjust for age or sex or race because these do not vary within person. (You can do analyses that check to see if the differences in effects between the interventions differ by sex, age, or race--but that is an additional level of complication I don't want to go into here, and isn't appropriate in a small data set anyway.)

The data requires some preparation, including a reshape to long layout, to prepare it for analysis.

Code:

rename (*_ACT) ACT#, addnumber(1) rename (Y*_HCHO_IN) HCHO_IN* reshape long ACT HCHO_IN, i(Participant Home) j(year) label define year 1 "Pre" 2 "Post" label values year year egen unit = group(Participant Home), label xtset unit year encode Sex, gen(sex) encode Race, gen(race) // CRUDE CONTRAST OF ACT ACROSS UNITS xtreg ACT i.year##i.vent, fe margins vent, dydx(year) noestimcheck // CONTRAST OF ACT ACROSS UNITS, ADJUSTED FOR HCHO_IN xtreg ACT i.year##i.vent c.HCHO_IN, fe margins vent, dydx(year) noestimcheck

Focus your attention on the output of the -margins- command. It contains a table with three rows, one for each value of variable vent. In the column headed dy/dx you will find the marginal effect of that vent on ACT (change from pre to post). The subsequent columns give the standard error, z statistic, pvalue comparing the marginal effect to zero, and a 95% confidence interval for the marginal effect. Now, remember, that it is really only meaningful to compare these to each other, so the z and p-value really are meaningless. They compare these effects to an absolute zero effect. But we don't know what an absolute zero effect means--it may not, probably does not, represent what you would see with no intervention. So ignore p and z here. Just look at the dydx values and their confidence intervals.

I did not adjust for HCHO_OUT because it has a lot of missing values in the example data and you would lose a lot of data. This may or may not be a problem in the real data set.
2 likes
Comment

Insung Kang

Join Date: Jan 2020
Posts: 17

26 Jan 2022, 11:31

Thank you so much for your concerns and optimistic feedback, Dr. Schechter!

Unfortunately, each home had one or more asthmatic participants. So, the statistical power reduced when I look at participant-level, since when same air quality in the same home but 2 participants showed different asthma scores. So, if I understood correctly, what I can go with the data are t-test to compare either air quality and health outcomes among three different interventions (e.g., ventilation types) each other, or fixed-effects regression with adjustment for potential confounders? One more..do you have your own criteria to consider particular confounding variables? You mentioned I cannot adjust for sex, race in population-level analysis, but can I adjust for sex and race when I analyze home-level analysis for t-test comparing air quality between Pre and Post period?

Lastly, I'm a phd student in architectural engineering at IIT, trying to newly learn and integrate biostats, epidemiology, etc. into the built environment and ultimately do interdisciplinary research with experts in chem and medicine. Could you please recommend any references or books for me? Everytime I analyzed the data, I had to look into other relevant references and try to learn from their methods.

Please apologize my long questions. I was so excited to get a hand in these details. I also put longer dataex for your understanding.

Best, Insung

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 Participant str4 Home str8 Ventilation double(pre_ACT post_ACT) str6 Sex byte age str25 Race byte crowd str3(ERvisit1 ERvisit2) double(Y1_NO2_IN Y1_NO2_OU Y2_NO2_IN Y2_NO2_OU)
"P1"  "H111" "Exhaust"  16.294117647058822 15.117647058823529 "Female" 64 "Hispanic or Latino"        0 "No"  "Yes" 58.292660375545566 86.47792081870321  59.15929968049114   82.696578782664
"P2"  "H138" "Balanced" 21.384615384615383 20.428571428571427 "Female" 54 "White"                     1 "No"  "Yes"  62.76490172807987 67.13153276908189 63.038986394450205 91.52022847635617
"P3"  "H138" "Balanced" 13.923076923076923               17.4 "Male"   51 "White"                     1 "Yes" "Yes"  62.76490172807987 67.13153276908189 63.038986394450205 91.52022847635617
"P4"  "H167" "CFIS"     22.466666666666665              23.25 "Female" 37 "White"                     0 "No"  "No"   56.33193661785609 92.08425675704547  63.27032227341164 85.44694323116376
"P5"  "H168" "CFIS"     20.857142857142858 21.555555555555557 "Female" 39 "White"                     0 "No"  "No"    66.1585704109377 79.89171590310734  57.13051973184713 76.85037873983948
"P6"  "H172" "CFIS"     23.307692307692307  24.11764705882353 "Female" 66 "White"                     0 "No"  "No"   57.68708598542611 91.99584429816122 57.236005188120224 87.90032517261284
"P7"  "H177" "Exhaust"                  21  22.23076923076923 "Female" 32 "White"                     0 "No"  "No"  61.899636465072604 90.18064563966318  60.68360524908931 77.70602166665135
"P8"  "H179" "Balanced"                 19 19.529411764705884 "Female" 43 "White"                     0 "No"  "No"   60.53874811766886 75.98246518472754  63.55254164483663 83.07689077876871
"P9"  "H181" "Exhaust"  14.666666666666666                 19 "Female" 54 "Black or African American" 0 "No"  "No"   63.30054657137316 71.62888721333134 59.039874414657966 88.41395510902606
"P10" "H182" "CFIS"     11.153846153846153 17.333333333333332 "Female" 46 "Black or African American" 1 "No"  "No"   61.58640446216849 68.49206065777369  51.43792454544989 80.49884267697904
"P11" "H182" "CFIS"     14.666666666666666 22.952380952380953 "Female" 25 "Black or African American" 1 "No"  ""     61.58640446216849 68.49206065777369  51.43792454544989 80.49884267697904
"P12" "H187" "Balanced"               21.5 21.210526315789473 "Female" 40 "Black or African American" 0 "Yes" "Yes" 51.518544973948785    89.76108953258 54.768295587682665 81.70162123806217
"P13" "H193" "CFIS"     20.692307692307693 22.058823529411765 "Male"   42 "Hispanic or Latino"        0 "No"  "No"  50.357783289201876  76.2007901892469  70.68138079860094 82.21666462791181
"P14" "H196" "CFIS"     23.642857142857142 24.705882352941178 "Male"   41 "Hispanic or Latino"        0 "No"  "No"  58.260161671038105 76.88250242111509  62.27559967087857  76.9933869890499
"P15" "H197" "Balanced"              15.75               20.5 "Female" 45 "Black or African American" 0 "Yes" "No"  62.838209840768165 86.29918457385837   53.1536962580685 84.99598718136862
"P16" "H207" "Exhaust"                23.5 24.444444444444443 "Female" 50 "White"                     0 "No"  "No"   54.84059779036874     80.1097142153  57.47775108340378 83.16414726129807
"P17" "H207" "Exhaust"                  23 24.666666666666668 "Male"   18 "White"                     0 "No"  ""     54.84059779036874     80.1097142153  57.47775108340378 83.16414726129807
"P18" "H208" "Balanced" 23.666666666666668  24.46153846153846 "Female" 43 "Black or African American" 0 "No"  "No"   62.02773449694277 71.50054139786468  53.51241752751034 83.43032390835602
"P19" "H213" "CFIS"     19.470588235294116 19.307692307692307 "Female" 39 "Hispanic or Latino"        1 "No"  "No"   58.59340280008861  89.2848722923692   55.9058420869943 82.38285961397175
"P20" "H214" "Balanced" 14.266666666666668 16.666666666666668 "Female" 49 "White"                     1 "Yes" "No"   56.51526616909314 85.65354233481656  59.47893759021493  82.8089398539521
"P21" "H217" "Exhaust"                18.8 20.555555555555557 "Female" 37 "White"                     1 "No"  "No"   71.91513763347969 82.18966562064037   63.7320198003429 93.13903384379326
"P22" "H218" "Exhaust"  22.466666666666665 19.833333333333332 "Female" 37 "White"                     1 "No"  "Yes"  59.04828438480723 75.29070433185623 57.676655048614386 87.70918086125141
"P23" "H218" "Exhaust"  19.933333333333334  21.61111111111111 "Male"   43 "White"                     1 "No"  ""     59.04828438480723 75.29070433185623 57.676655048614386 87.70918086125141
"P24" "H220" "CFIS"      23.31578947368421                 24 "Female" 42 "Hispanic or Latino"        1 ""    ""     72.34145468609995 82.78534127905537  68.82807516238483  78.1223266366932
"P25" "H220" "CFIS"      16.36842105263158 14.846153846153847 "Female" 67 "Hispanic or Latino"        1 ""    ""     72.34145468609995 82.78534127905537  68.82807516238483  78.1223266366932
"P26" "H220" "CFIS"      18.88888888888889  20.76923076923077 "Male"   45 "Hispanic or Latino"        1 "No"  "Yes"  72.34145468609995 82.78534127905537  68.82807516238483  78.1223266366932
"P27" "H220" "CFIS"      17.05263157894737              15.75 "Female" 23 "Hispanic or Latino"        1 ""    ""     72.34145468609995 82.78534127905537  68.82807516238483  78.1223266366932
"P28" "H221" "CFIS"     13.411764705882353             15.625 "Female" 51 "Black or African American" 1 "No"  "No"   69.49648147342145 76.52813138254353  56.78363968780835 80.69165433812701
"P29" "H221" "CFIS"     14.647058823529411 13.142857142857142 "Female" 26 "Black or African American" 1 "Yes" ""     69.49648147342145 76.52813138254353  56.78363968780835 80.69165433812701
"P30" "H224" "Balanced"            23.8125  23.72222222222222 "Male"   50 "White"                     0 ""    ""     54.32676255644743 79.85502539058447  57.80199794817154 87.20293236032451
"P31" "H224" "Balanced" 22.533333333333335  22.38888888888889 "Female" 50 "White"                     0 "No"  "No"   54.32676255644743 79.85502539058447  57.80199794817154 87.20293236032451
"P32" "H225" "CFIS"                   23.4                 25 "Female" 59 "Black or African American" 0 "No"  "No"  59.671843011945995 70.43095775679919  54.42301346554674 83.91837887352293
"P33" "H228" "Exhaust"                20.8 22.647058823529413 "Male"   62 "Black or African American" 0 "No"  "No"   53.90672805748317 78.13316341594992  57.77286739629417 71.27682623668976
"P34" "H231" "CFIS"      7.705882352941177  7.416666666666667 "Female" 55 "Black or African American" 1 "Yes" "No"   57.08181459315873 75.95309584535812  55.45328829787712 83.14743107004271
"P35" "H233" "Balanced" 21.357142857142858            19.0625 "Female" 25 "Black or African American" 0 "No"  "No"   70.39749178334814 64.81249221580596  56.39282547180242 80.67409582224768
"P36" "H242" "Exhaust"  22.307692307692307 21.772727272727273 "Female" 79 "Black or African American" 0 "No"  "No"  57.849404924919945 81.80452687717525   63.1695784612863 87.78293392345113
"P37" "H243" "CFIS"      19.63157894736842             21.875 "Male"   45 "White"                     0 "No"  "No"   61.67672413560004 72.62120943114994 59.612177646446185 85.11817423073818
"P38" "H244" "Exhaust"  16.923076923076923 19.333333333333332 "Male"   52 "Black or African American" 1 "No"  "No"   73.19553777782221 66.90638977927227  69.03843901409901 79.13770649583024
"P39" "H245" "Exhaust"  23.357142857142858  22.77777777777778 "Male"   41 "White"                     1 "No"  "No"   70.06353471946116 99.97444138995834  63.37721971586361 82.70938045797855
"P40" "H247" "Balanced" 18.714285714285715 19.941176470588236 "Female" 78 "Black or African American" 0 "Yes" "No"   56.56362064361496 68.76335961787731  55.87518356065456  77.8039760897257
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

26 Jan 2022, 12:31

You mentioned I cannot adjust for sex, race in population-level analysis, but can I adjust for sex and race when I analyze home-level analysis for t-test comparing air quality between Pre and Post period?

No. With a t-test, you cannot adjust for anything. A t-test is limited to an unadjusted comparison, and it is also limited to a comparison of two conditions. It is not adequate for your purposes. The best it can do is an overall comparison of pre vs post, not distinguishing the different ventilation interventions, and not adjusting for anything.

Unfortunately, each home had one or more asthmatic participants.

In that case, the fixed effects analysis that I demonstrated in #4 is not applicable. You will need to go to a more complicated mixed-effects model with three levels. This is no longer a strictly within-person comparison, but the good news is that you can adjust for other variables in this kind of model.

The data set you show this time is a bit different in nature from the one before and needs a bit more preparation for analysis.

Code:

isid Participant rename pre_ACT ACT1 rename post_ACT ACT2 rename Y1_* *1 rename Y2_* *2 foreach v of varlist Participant Home Ventilation Sex Race { encode `v', gen(_`v') drop `v' rename _`v' `v' } label define yesno 0 "No" 1 "Yes" foreach v of varlist ERvisit* { encode `v', gen(_`v') label(yesno) drop `v' rename _`v' `v' } reshape long ACT ERvisit NO2_IN NO2_OU, i(Participant) j(pre_post) label define pre_post 1 "Pre" 2 "Post" label values pre_post pre_post mixed ACT i.pre_post##i.Ventilation c.(age crowd) i.(Sex Race) || Participant: || Home: margins Ventilation, dydx(pre_post)

Again, the analysis can provide only comparisons among the three Ventilation interventions as there is no control group. The -margins- output is interpreted the same way as in #3. The only difference is that now the results are adjusted for age, crowd, Sex, and Race.

I don't know what the roles of the NO2 and ER visit variables are here. It strikes me that they are probably surrogate outcome variables. But it is also conceivable that you would want to use the NO2 variables as covariates, as a simplistic (and not very useful) way to demonstrate that the differences among the ventilation interventions are mediated by the NO2 variables. I don't know if that makes sense scientifically or not, but it might. I would imagine that the ER visit variables, however, are actually consequences of asthma control, so adjusting for them would not make sense.

Could you please recommend any references or books for me?

I have taught introductory statistics classes many times, but not recently. I have used many different textbooks for that. I think that different people have different learning needs. And people have different levels of comfort with mathematical presentation. So I am reluctant to recommend specific books for individuals I don't really know. My best advice is to go to the website of a bookseller and search for books on introductory statistics or introductory epidemiology. Browse the tables of contents and take a look at some content pages to get a feel for what you might find most helpful. I will add that there are some books on these topics available in the StataCorp online bookstore--these would have the advantage of also incorporating "how to do it in Stata" into the presentation, which would be helpful if you are likely to continue using Stata going forward.
1 like
Comment
Insung Kang

Join Date: Jan 2020

Posts: 17
#7

26 Jan 2022, 13:11

Thank you so much for taking your time on my question! I got your point! Yes, NO2 and ER visits were potentially significant outcome variables I assumed but not for confounders.

Stata bookstore sounds perfect to me! I really appreciate it!

Have a great rest of your day!
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#8

26 Jan 2022, 13:43

I wasn't taught this way, but I presume the Gentle Introduction to Stata is a good series to start with, as well as a few others I could recommend. If you're looking for something more mathematical though, the Using Stata for Principles of Microeconomics is another good one that I wasn't taught with but I occasionally use it as a reference. Another good one is Workflow of Data Analysis Using Stata by Long, which really helped me become a better and more organized researcher.
1 like
Comment
Insung Kang

Join Date: Jan 2020

Posts: 17
#9

26 Jan 2022, 14:01

Thank you Jared! I'll definitely check out the Workflow of Data Analysis Using Stata. Have a great rest of your day!
Comment

Announcement