Declare data to be time series

Roberto Vidri

Join Date: Mar 2019

Posts: 36
#1

Declare data to be time series

10 Nov 2020, 23:01

Dear STATA experts,

I'm working on a Interrupted Time Series analysis.

I have individual level data of over 30k subjects. My variables of interest are year of diagnosis (YEAR_OF_DIAGNOSIS), the treatment variable which creates 2 groups ("expand" 1/0), and the outcome of interest (uninsured 1/0).
YEAR_OF_DIAGNOSIS is in "double" format.

If I sort the data by year of diagnosis and expand and attempt to declare data as a time series, I get an error: "repeated time values within panel."
I assume this is because this is individual level data.

Code:

sort expand YEAR_OF_DIAGNOSIS tsset expand YEAR_OF_DIAGNOSIS

I was able to get "around" this error (I think) by collapsing all the data.

Code:

sort expand YEAR_OF_DIAGNOSIS collapse uninsured, by (YEAR_OF_DIAGNOSIS expand) list tsset expand YEAR_OF_DIAGNOSIS

This appears to work, and allows me to proceed with the ITS.

However, I'm not sure if this is the correct way of doing this. Is there a better way, by actually using the individual level data?

I would appreciate any help.

Last edited by Roberto Vidri; 10 Nov 2020, 23:06.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

11 Nov 2020, 11:22

The questions you pose cannot be answered without a full explanation of the context and meaning of the variables, the design of the data collection plan that create this data set, and a clear statement of the research question you are trying to answer.
1 like
Comment
Roberto Vidri

Join Date: Mar 2019

Posts: 36
#3

11 Nov 2020, 12:11

Dr. Schechter - Thank you for your help; I always I look forward to your replies in this forum.

I'l start with the data. I'm using the NCDB database (you may be familiar with it). This is a retrospective cohort analysis. Individua level data for every subject is available (only one entry per subject). Subjects have unique identifiers and there are no duplicates. After inclusion/exclusion criteria, I have >30k subjects with complete data for my analysis.

Research question: To determine the effects of the "Medicaid Expansion" on the certain surgical procedures for a selected cancer. The database contains a variable for this that categorizes states into "no expansion", "early expansion (2010-2013)", "expansion on 01/2014" or "late expansion (>01/2014)". For analysis purposes I created a new dichotomous variable "expand". All the states that "expanded" are a 1, the "no expansion" are 0. This was based on other published papers, but I understand is not perfect.

The first part of the analysis looks into the actual effect of the medicaid expansion on the number of "uninsured" subjects (outcome). The data contains a categorical variable that assigns values based on the "payor" or insurance type. This was dichotomized into 1: "uninsured", 0: "Has insurance" (Private, Medicaid, etc.).

The variable within the set "YEAR_OF_DIAGNOSIS" has a value for every subject, based on our criteria, between 2010-2017. This is in "double" format.

The code I utilized to set up the analysis is below. I'm using the "itsa" package for ITS with multiple groups defined as: controls (expand=0) and treated (expand=1).

Code:

preserve sort expand YEAR_OF_DIAGNOSIS collapse uninsured, by (YEAR_OF_DIAGNOSIS expand) list tsset expand YEAR_OF_DIAGNOSIS itsa uninsured, trperiod(2014) treatid(1) lag(1) figure replace posttrend actest, lags(6) restore

This actually provides an output that looks "correct".

Unfortunately, I have very little/no experience with ITS and the "tsset" command and the statisticians at my institution do not use STATA. Collapsing the data was the only work around I could come up to make it work. It gives me the mean of "uninsured" by year and "treatment". However, I'm afraid this may not be the right way of doing this.

I hope this clarifies some of your questions. Once gains, thank you for your help.

Last edited by Roberto Vidri; 11 Nov 2020, 12:14.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

11 Nov 2020, 13:19

Very clear explanation, thanks. It looks to me like everything you did is appropriate for this data and this research question. The -collapse- of the data set actually creates time series for the expanded and unexpanded groups that you can analyze using an interrupted time-series analysis, and it correctly calculates the outcome variable you need: the proportion of uninsured in each group.
1 like
Comment
Roberto Vidri

Join Date: Mar 2019

Posts: 36
#5

11 Nov 2020, 13:29

Originally posted by Clyde Schechter View Post

Very clear explanation, thanks. It looks to me like everything you did is appropriate for this data and this research question. The -collapse- of the data set actually creates time series for the expanded and unexpanded groups that you can analyze using an interrupted time-series analysis, and it correctly calculates the outcome variable you need: the proportion of uninsured in each group.

Once again, thank you! This is reassuring!
Comment
Farhad Kabir

Join Date: Nov 2020

Posts: 1
#6

12 Nov 2020, 03:41

Thank you so much Clyde Schechter.
Comment
Roberto Vidri

Join Date: Mar 2019

Posts: 36
#7

06 Jan 2021, 15:03

Originally posted by Clyde Schechter View Post

Very clear explanation, thanks. It looks to me like everything you did is appropriate for this data and this research question. The -collapse- of the data set actually creates time series for the expanded and unexpanded groups that you can analyze using an interrupted time-series analysis, and it correctly calculates the outcome variable you need: the proportion of uninsured in each group.

Dr. Schechter, I hope you're doing well!

Regarding the above discussion - utilizing an interrupted time series analysis (linear regression) for a "binary outcome" (0=Uninsured, 1=Insured), is this ideal? Should I use a logistic regression? Is there a package that uses logistic instead of linear regression?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#8

06 Jan 2021, 17:38

Well, you might be able to use the linear regression model anyway. Linear probability models are perfectly valid. They are a bit dicey to interpret when the probabilities of the outcome being estimated are very close to zero or one--it is possible to get estimates and confidence intervals outside the 0 to 1 range. But if the probabilities in the various parts of the time series stay comfortably away from 0 and 1, a linear model is OK and interpretable, and if you are comfortable using -itsa-, go right ahead.

If you feel that -itsa- will not properly serve your needs, then you can use -xtlogit- instead. You will have to do some setup work for your analysis. But basically what you are doing here, since you have a control group, is a difference-in-differences analysis, and the crux of it is an interaction between a pre-post indicator and a treatment/control indicator.
1 like
Comment
Roberto Vidri

Join Date: Mar 2019

Posts: 36
#9

07 Jan 2021, 09:15

Originally posted by Clyde Schechter View Post

Well, you might be able to use the linear regression model anyway. Linear probability models are perfectly valid. They are a bit dicey to interpret when the probabilities of the outcome being estimated are very close to zero or one--it is possible to get estimates and confidence intervals outside the 0 to 1 range. But if the probabilities in the various parts of the time series stay comfortably away from 0 and 1, a linear model is OK and interpretable, and if you are comfortable using -itsa-, go right ahead.

If you feel that -itsa- will not properly serve your needs, then you can use -xtlogit- instead. You will have to do some setup work for your analysis. But basically what you are doing here, since you have a control group, is a difference-in-differences analysis, and the crux of it is an interaction between a pre-post indicator and a treatment/control indicator.

Thank you!
Comment

Announcement

Declare data to be time series

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment