Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Declare data to be time series

    Dear STATA experts,


    I'm working on a Interrupted Time Series analysis.

    I have individual level data of over 30k subjects. My variables of interest are year of diagnosis (YEAR_OF_DIAGNOSIS), the treatment variable which creates 2 groups ("expand" 1/0), and the outcome of interest (uninsured 1/0).
    YEAR_OF_DIAGNOSIS is in "double" format.

    If I sort the data by year of diagnosis and expand and attempt to declare data as a time series, I get an error: "repeated time values within panel."
    I assume this is because this is individual level data.

    Code:
    sort expand YEAR_OF_DIAGNOSIS
    
    tsset expand YEAR_OF_DIAGNOSIS
    I was able to get "around" this error (I think) by collapsing all the data.

    Code:
    sort expand YEAR_OF_DIAGNOSIS
    
    collapse uninsured, by (YEAR_OF_DIAGNOSIS expand)
    list
    
    tsset expand YEAR_OF_DIAGNOSIS
    This appears to work, and allows me to proceed with the ITS.

    However, I'm not sure if this is the correct way of doing this. Is there a better way, by actually using the individual level data?

    I would appreciate any help.
    Last edited by Roberto Vidri; 10 Nov 2020, 23:06.

  • #2
    The questions you pose cannot be answered without a full explanation of the context and meaning of the variables, the design of the data collection plan that create this data set, and a clear statement of the research question you are trying to answer.

    Comment


    • #3
      Dr. Schechter - Thank you for your help; I always I look forward to your replies in this forum.

      I'l start with the data. I'm using the NCDB database (you may be familiar with it). This is a retrospective cohort analysis. Individua level data for every subject is available (only one entry per subject). Subjects have unique identifiers and there are no duplicates. After inclusion/exclusion criteria, I have >30k subjects with complete data for my analysis.

      Research question: To determine the effects of the "Medicaid Expansion" on the certain surgical procedures for a selected cancer. The database contains a variable for this that categorizes states into "no expansion", "early expansion (2010-2013)", "expansion on 01/2014" or "late expansion (>01/2014)". For analysis purposes I created a new dichotomous variable "expand". All the states that "expanded" are a 1, the "no expansion" are 0. This was based on other published papers, but I understand is not perfect.

      The first part of the analysis looks into the actual effect of the medicaid expansion on the number of "uninsured" subjects (outcome). The data contains a categorical variable that assigns values based on the "payor" or insurance type. This was dichotomized into 1: "uninsured", 0: "Has insurance" (Private, Medicaid, etc.).

      The variable within the set "YEAR_OF_DIAGNOSIS" has a value for every subject, based on our criteria, between 2010-2017. This is in "double" format.

      The code I utilized to set up the analysis is below. I'm using the "itsa" package for ITS with multiple groups defined as: controls (expand=0) and treated (expand=1).

      Code:
      preserve
      
      sort expand YEAR_OF_DIAGNOSIS
      
      collapse uninsured, by (YEAR_OF_DIAGNOSIS expand)
      list
      
      tsset expand YEAR_OF_DIAGNOSIS
      
      itsa uninsured, trperiod(2014) treatid(1) lag(1) figure replace posttrend
      
      actest, lags(6)
      
      restore
      This actually provides an output that looks "correct".
      Click image for larger version

Name:	Screen Shot 2020-11-11 at 14.08.19.png
Views:	1
Size:	288.4 KB
ID:	1581301



      Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	57.0 KB
ID:	1581299




      Click image for larger version

Name:	Screen Shot 2020-11-11 at 14.04.54.png
Views:	1
Size:	214.6 KB
ID:	1581300




      Unfortunately, I have very little/no experience with ITS and the "tsset" command and the statisticians at my institution do not use STATA. Collapsing the data was the only work around I could come up to make it work. It gives me the mean of "uninsured" by year and "treatment". However, I'm afraid this may not be the right way of doing this.

      I hope this clarifies some of your questions. Once gains, thank you for your help.




      Last edited by Roberto Vidri; 11 Nov 2020, 12:14.

      Comment


      • #4
        Very clear explanation, thanks. It looks to me like everything you did is appropriate for this data and this research question. The -collapse- of the data set actually creates time series for the expanded and unexpanded groups that you can analyze using an interrupted time-series analysis, and it correctly calculates the outcome variable you need: the proportion of uninsured in each group.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Very clear explanation, thanks. It looks to me like everything you did is appropriate for this data and this research question. The -collapse- of the data set actually creates time series for the expanded and unexpanded groups that you can analyze using an interrupted time-series analysis, and it correctly calculates the outcome variable you need: the proportion of uninsured in each group.
          Once again, thank you! This is reassuring!

          Comment


          • #6
            Thank you so much Clyde Schechter.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Very clear explanation, thanks. It looks to me like everything you did is appropriate for this data and this research question. The -collapse- of the data set actually creates time series for the expanded and unexpanded groups that you can analyze using an interrupted time-series analysis, and it correctly calculates the outcome variable you need: the proportion of uninsured in each group.
              Dr. Schechter, I hope you're doing well!

              Regarding the above discussion - utilizing an interrupted time series analysis (linear regression) for a "binary outcome" (0=Uninsured, 1=Insured), is this ideal? Should I use a logistic regression? Is there a package that uses logistic instead of linear regression?

              Comment


              • #8
                Well, you might be able to use the linear regression model anyway. Linear probability models are perfectly valid. They are a bit dicey to interpret when the probabilities of the outcome being estimated are very close to zero or one--it is possible to get estimates and confidence intervals outside the 0 to 1 range. But if the probabilities in the various parts of the time series stay comfortably away from 0 and 1, a linear model is OK and interpretable, and if you are comfortable using -itsa-, go right ahead.

                If you feel that -itsa- will not properly serve your needs, then you can use -xtlogit- instead. You will have to do some setup work for your analysis. But basically what you are doing here, since you have a control group, is a difference-in-differences analysis, and the crux of it is an interaction between a pre-post indicator and a treatment/control indicator.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Well, you might be able to use the linear regression model anyway. Linear probability models are perfectly valid. They are a bit dicey to interpret when the probabilities of the outcome being estimated are very close to zero or one--it is possible to get estimates and confidence intervals outside the 0 to 1 range. But if the probabilities in the various parts of the time series stay comfortably away from 0 and 1, a linear model is OK and interpretable, and if you are comfortable using -itsa-, go right ahead.

                  If you feel that -itsa- will not properly serve your needs, then you can use -xtlogit- instead. You will have to do some setup work for your analysis. But basically what you are doing here, since you have a control group, is a difference-in-differences analysis, and the crux of it is an interaction between a pre-post indicator and a treatment/control indicator.
                  Thank you!

                  Comment

                  Working...
                  X