stata package for case cross over studies

Vishal Sharma

Join Date: Sep 2018

Posts: 60
#1

stata package for case cross over studies

19 Jan 2019, 17:32

is there a way to organize/manipulate data for use in a case crossover study?
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

19 Jan 2019, 18:28

The output of search crossover suggests that the pkcross command is intended to analyze crossover experiments. And the output of help pkcross tells us

pkcross analyzes data from a crossover design experiment. When analyzing pharmaceutical trial data, if the treatment, carryover, and sequence variables are known, the omnibus test for separability of the treatment and carryover effects is calculated.

Since I'm not familiar with pharmacological analyses and with Stata's pharmacokinetic (biopharmaceutical) commands (see the output of help pk) I don't know if this is what you had in mind.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

20 Jan 2019, 02:57

William gave excellent advice.

You didn't give details about the data. Sharing data - or a toy example - is a great starting point to entail insightful replies.

That being said, depending on what you want to estimate, - mixed - command may do the trick.

You may wish to read this thread.

Hopefully that helps.

Best regards,

Marcos
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

20 Jan 2019, 08:14

A case-crossover design, presuming that term is used in its strict sense, is a different beast than a conventional crossover design. It involves a sample of individuals who have experienced an acute event (e.g., heart attack, auto accident), with a within-subject comparison of covariate data at the time of that event versus at some "control" period(s) prior in time to the acute event.
Here's one recent methodological reference, which I have not myself read:
:Carracedo-Martínez, E., Taracido, M., Tobias, A., Saez, M. and Figueiras, A., 2010. Case-crossover analysis of air pollution health effects: a systematic review of methodology and application. Environmental health perspectives, 118(8), pp.1173-1182.

My understanding is that one method of analysis is to use conditional logistic regression, with grouping on the individual. However, my vague memory is that there are some issues with a simple -clogit- analysis. Anyway, I feel confident that Stata can handle analysis of this design, but I think the particulars will depend on what the control period data is like, so I'd encourage Vishal to describe the study data in question.
3 likes
Comment

Vishal Sharma

Join Date: Sep 2018
Posts: 60

04 Feb 2019, 17:59

hi all . i will share more details about my question now that i have a better idea of the data and study type.. i will explain first then use an example.

we 're doing a case cross over study where cases serve as their own controls. exposure frequency is counted in the "case window" and compared with exposure frequencies in the "control window"(usually precedes the case window). this is then analyzed using conditional logistic regression.

in our data, the study is a population of people on drug A. i will have the data arranged by id, then have a binary variable that represents each day of the year (1-365) that equals 1 if the subject is also on drug B, and equals 0 if the subject is not on drug B. i will also have a hospitalization date. so this study is being done to see if people who are on drug A and B have higher rates of hospitalization.

the data will initially look like this:


id	x1	x2	x3	x4	x5..........	x365	hospitalization date (or day)
1	0	0	1	1	1	0	270
2
3
n

x1 x2 x3 x4 .....x365 are binary variables representing every day in the study year (1-365)

subject 1- took drug B on days 3, 4, 5 ( so was concurrently taking drugs A and B on these days since the study population is all users of drug A) and was hospitalized on day 270. every hospitalization will have its own line of data. if person 1 was hospitalized 3 times , then they would have 3 lines. i would like to code the exposure variables using the following case and control windows:

case window-the 7 days preceding the hospitalization
control window-7 days preceding the case window and 30 days before hospitalization.

so in the above example... for person 1, the case window would be days 263-270, and the control windows will be days 256-263 and 233-240.

i need to be able to code exposure to drug B for these 3 time periods.

if subject 1 had exposure to drug B during the case window (because one of x263 - x270 was equal to 1), then the case window exposure variable would equal 1, if no exposure to drug B during the case window, then exposure variable equals 0. the same would be true for the 2 control windows.

the data would look like this then:

id	x1	x2	x3	x4	x5...	x365	hospitalization day	case window (day263-270)	control window1(day256-262)	control window2(day233-240)
1	0	0	1	1	1	0	270	1	0	0
2
3
n

in the above data, person 1 had exposure to drug B in the case window and no exposure during the two control windows

ideally, i would like to reshape the data as follows in order for stata to perform clogit on it:

stratum	observation #	case	exposure
1	1	1	1
1	2	0	1
1	3	0	1
2	1	1	0
2	2	0	1
2	3	0	0
3	1	1	1
3	2	0	0
3	3	0	1

in each strata, observations 2-3 represent control windows and observation 1 represents case windows... exposure = 1 if there was concurrent use of drug A and B and exposure=0 if not

any feedback or starting points would be very much appreciated.. mostly on how to code for the binary variables for the case windows and the 2 control windows. (ie the middle table above) would be most helpful

thanks!!!
vishal

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

04 Feb 2019, 18:41

I can get you part of the way there. I find your explanation confusing. Is stratum in the final table just a synonym for id in the starting data? Any reason to change the name? Also, why do you want the three observations per stratum in that particular order. None of the regression commands will in any way care about that? What is the variable case? It appears to arise from nowhere, and I cannot tell what it is supposed to be, nor how it relates to the original data.

Anyway, here's a start.

Code:

// CREATE RANDOM DEMONSTRATION DATA clear* set obs 50 set seed 1234 gen id = _n forvalues i = 1/365 { gen x`i' = runiform() < 0.25 } gen hospitalization_day = runiformint(1, 365) // MANAGE THE DATA TO SET IT UP FOR A CASE CROSSOVER ANALYSIS reshape long x, i(id) j(day) by id, sort: egen exposedcase = max(cond(inrange(hospitalization_day-day, 0, 7), x, .)) by id: egen exposedcontrol1 = max(cond(inrange(hospitalization_day-day, 8, 14), x, .)) by id: egen exposedcontrol2 = max(cond(inrange(hospitalization_day-day, 30, 37), x, .)) drop x day duplicates drop reshape long exposed, i(id) j(_window) string // MAKE THE WINDOW VARIABLE NUMERIC SO IT CAN BE USED IN ANALYSIS encode _window, gen(window) drop _window

To my mind, this is all you need to set the data up for a case-crossover analysis. You seem to have something more complicated in mind, and as I don't understand it, I will leave it to you to take it from here.

I'll just take the opportunity to note that this is very simple to do in long layout, and very difficult to do in wide. Once the first -reshape- is done, it's all easy.

May I just point out that the choice of windows here is open to criticism for not being robust to secular or seasonal* trends in exposure to drug B. Perhaps there are no such trends that are relevant to drug B, but I'd look into that carefully before just assuming it.

*I am using seasonal here in the generic sense, so it might refer to days of the week, or calendar months, or any other kind of periodicity.
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#7

04 Feb 2019, 19:38

My thinking is similar to Clyde's, including uncertainties about Vishal's meaning, but I would have thought we'd want a long data set with an observation for each subject for every day s/he is at risk up until the day of hospitalization. I haven't read about case-crossover designs for many years, so I'm definitely assenting to Clyde's advice here, but a brief explanation would be interesting, at least to me, and perhaps to others.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

04 Feb 2019, 22:51

Well, I am hardly an expert on case-crossover studies myself, but as it happens I have recently had some involvement with one, so I have refreshed my understanding of them a bit. The main issue with case-crossover designs is the same as with any other case-control design: the way in which controls are selected may introduce differences that are unrelated to the outcome. Now, in a case-crossover design "control" here refers to the time periods used for control observations--the patients are their own controls so that all time-invariant attributes, observed or not, are automatically matched.

The problem that can arise is that many exposures fluctuate systematically over time. If, for example, Mr. Sharma's Drug B is one that is commonly used in conjunction with the flu, then exposure to Drug B will be more frequent during the winter months. So for situations like this, it is usually a good idea to have at least one of the control time period(s) be in the same calendar month as the case (outcome event) time period. Or, if Drug B is a relatively new drug whose use is simply growing over time, there is an inherent bias in using only control periods that precede the outcome event: a bidirectional study that includes a control period after the event might be better. (This can be problematic in other ways, however, for example, if the event itself causes the patient to subsequently avoid drug B!) Some types of exposures exhibit weekly cycles. I can't think of any drugs that do that, but these studies are often used to study, for example, the health effects of traffic-related air pollution, and in most locations that pollution is markedly lower on weekends and holidays than weekdays, and there is typically some systematic variation across the 5 workdays as well. So, again, to avoid bias, it becomes important that the control period match the case period on day(s) of the week. To some extent the proposal in #1 achieves these goals. For example, the case and control1 windows are both 7 consecutive days, so days of week are matched. And in most, but not all cases, they will also fall in the same calendar month. But control period 2 does not appear well matched on either of those counts. And there are no post-event control periods (which could be good or bad depending.)

There is no easy formula to choosing the control periods that will resolve this problem. You have to think through what is known about the periodicity, at various frequencies, of the exposure pattern and do the best you can to match. Not knowing what drug B is and not knowing what kind of hospitalization event is under study here, I can't even begin to give concrete advice here, and I did not mean to be overly critical of the choices made. I may have used overly strong language, but my intent was merely to nudge Mr. Sharma to look carefully at these control windows from this perspective.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#9

05 Feb 2019, 08:52

This reminds me of some of the examples in Singer & Willett's Applied Longitudinal Data Analysis. In Chapter 5, for example, the data in Table 5.6 show a dichotomous indicator variable for unemployment (unemp) that can change back and forth from time to time within the records for one ID. In Vishal's final table in #5, that's what the exposure variable does. It seems to me, therefore, that he could use -melogit- with occasions of observation clustered within patient ID. And the potentially problematic variables Clyde discusses in #8 (e.g., work day, month or season, etc.) could be included as time-dependent covariates. Including them will probably not solve all of the problems Clyde describes, but it should help a bit, I think.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#10

05 Feb 2019, 10:22

I'm still interested here in why we would not use a long file with all the daily observations on each subject? To make it simple, let's suppose we just define exposure as "DrugB on the prior day," regardless of windows. And, let's assume that the days after hospitalization are informative with respect to control-time exposure. What I'm thinking of is something as simple as this:

Code:

// simulate data clear set obs 50 set seed 447 // 48234 gen id = _n forvalues i = 1/365 { gen DrugB`i' = runiform() < 0.25 } gen hospitalization_day = runiformint(1, 365) // long format, with "person-days" as the unit of analysis reshape long DrugB, i(id) j(day) gen byte case_day = (day == hospitalization_day) bysort id (day): gen byte exposed = DrugB[_n-1] // simplified exposure // Analysis clogit case_day exposed, group(id) // melogit case_day exposed || id // cc case_day exposed, by(id)

That is: "Within person, how is DrugB exposure on the prior day related to any given day being the day on which hospitalization occurred."

Re the difficulties about biases consequent on covariates that are related to time:

Allison, P.D. and Christakis, N.A., 2006. Fixed-effects methods for the analysis of nonrepeated events. Sociological methodology, 36(1), pp.155-172.

They advocate something called the "case-time-control method" for estimation.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

05 Feb 2019, 10:58

You very well might do it that way (although I think you would use -xtlogit, fe-, not -melogit- because you want a pure within-person estimation.) It all depends on what you think about the latency between exposure and outcome.

The design your code embodies is fine if you are pretty confident that the outcome is triggered by the exposure in precisely 1 day. But often we don't know much about the latency. Or we might actually know that the latency varies among people between 1 and 5 days. In that case, your design would be would underestimate the association, whereas choosing longer "windows" would give a better estimate.

There are a zillion variations on the case-crossover design, and which variation is best for a particular problem really depends on an in-depth knowledge of the epidemiology of both the exposure and the response, and the mechanisms that are believed to connect them.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#12

05 Feb 2019, 11:16

Yes, agreed; just chose a one day window for simplicity of defining the exposure.
Comment
Vishal Sharma

Join Date: Sep 2018

Posts: 60
#13

05 Feb 2019, 11:22

thanks all,

Clyde, the code you provided was very helpful! thanks!
Comment

Announcement