Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time Series Analysis

    I have the following dataset, which I also recently posted about in another post: the dataset contains counts (Nr_attend) of the number of attendances at Cardiology outpatient clinics for 11 different health board areas (HB_n) in Scotland from March 2014 until June 2018. Health board areas mark different regions in Scotland. I need to prepare the data for time series analysis and produce appropriate graphs to show the development of number of attendances over time for: i) two health board areas of my choice (HB_n) ii) Scotland overall.

    Thus far, to respond to the first question, I denoted my data as time series using the 'mydate' variable as my time variable (it only contains months and years). The number of attendances (panel data) variable contains pretty diverse information, with minimum number of attendances being 0 and maximum being 12,358. I ran:
    Code:
    twoway (tsline Nr_attend)
    to visualize the development of number of attendances across time overall and obtained the uploaded figure, which looks nothing short of weird.

    Am I doing something wrong so far? Do I need to derive a total of Nr_attend first, before I proceed to solving i) and ii)?

    Many thanks!
    Click image for larger version

Name:	CardAttOverTime.jpg
Views:	1
Size:	54.2 KB
ID:	1717689

    Last edited by Angelina Kancheva; 19 Jun 2023, 15:00.

  • #2
    For each monthly date you have several values and they are being superimposed. As you say the counts vary from 0 to several thousand. The odd verticals are connections between different values within ach year and the odd angled segments are connections between years.

    You won't get a plot by boards unless you ask for it.

    Compare this sequence

    .
    Code:
     webuse grunfeld, clear
    
    . tsset
    
    Panel variable: company (strongly balanced)
     Time variable: year, 1935 to 1954
             Delta: 1 year
    
    . tsline invest
    
    . line invest year, by(company) ysc(log)
    except that log scale won't march with your values of zero.

    Comment


    • #3
      Thank you.

      The problem with my date variable is that, when I try to specify it as a time variable for time series analysis, I get the error that there are 'repeated time values in sample'.

      It is in %tm format right now.

      Should I specify another variable, which uniquely identifies separate observations in my data, as my panel variable, so as to be able to run a time series analysis?

      Comment


      • #4
        Just to further clarify (apologies for the incomplete information), my data is panel. I have many observations collected at the same time point, then at a later one, etc., so the structure of the data is panel/longitudinal.

        Can I still run a time series analysis in that case? Would specifying a panel variable in addition to my time variable circumvent the above issue, or do I need another approach (that is not time series) altogether?

        Comment


        • #5
          You don't show your code but I guess you attempted to tsset in terms of a monthly date variable alone -- and that is doomed to failure because you have panel data and need to specify a panel identifier too. See

          Code:
          help tsset

          But your posts are contradictory as the tsline in #1 would not have worked (in the sense of being legal and producing a result, useless though it is) if a tsset command had failed before.

          The display format is separate from declaring a time series variable. If you have declared a time series display format, then tsset uses it, but it is not needed. I can declare a time series variable without specifying a date display format, as this silly example shows:

          Code:
          . sysuse auto, clear
          (1978 automobile data)
          
          . gen t = _n
          
          . tsset t
          
          Time variable: t, 1 to 74
                  Delta: 1 unit
          So, the problem appears to be not telling Stata about a panel identifier that you should already have and the solution is to fix that.

          Most posts here need, or least benefit from, showing CDE

          Code you tried.

          Data (example)

          Error: what went wrong.

          Your posts so far typically report the last but not the other two. Please do study https://www.statalist.org/forums/help#stata for guidance on asking a good question.

          Comment


          • #6
            Thank you very much.

            To produce the graph above, I had done:

            Code:
            format %tm mydate
            tsset Nr_attend mydate
            Mydate is my time variable and Nr attendances is the variable whose development I would like to track over time.

            To this I get:

            Code:
            Panel variable: Nr_attend (unbalanced)
             Time variable: mydate, 2014m3 to 2018m6
                     Delta: 1 month
            So, I then ran a:

            Code:
            tsfill
            However, looking at the code this morning, I doubted if I can run a time series analysis on these data in the first place. I specified number of attendances as my panel variable because this is the variable that uniquely identifies each observation in my dataset.

            Many thanks again.
            Last edited by Angelina Kancheva; 20 Jun 2023, 02:30.

            Comment


            • #7
              That explains one puzzle, but Nr_attend is not a suitable panel identifier -- it's an outcome -- and so while tsset accepted it as legal (because in practice it has integer values and evidently there are no repetitions, the only specification of panel identifier that makes biostatistical or epidemiological or other substantive sense would be a numeric identifier for say hospital board (or clinic).

              tsfill just created nonsense extra observations, unfortunately.

              Time series analysis is an entire field. I don't know what analysis makes sense here, but many things are possible if you only specify identifier and time variable appropriately.

              Still no data example!

              Comment


              • #8
                Thank you Nick.

                My understanding is that if a dataset does not contain a variable, which helps uniquely identify each observation, time series might not be an appropriate choice of method then?

                The variables in my dataset all contain repeated observations. I have a date variable, number of attendances to cardiology clinics in Scotland, Scottish districts corresponding to each date and number of attendances, and a variable denoting number of interventions delivered at each site. I am working on an assignment, which specifies that I need to run a time series analysis on these data, so the choice was not mine, to put it this way.

                I have to track the development of number of attendances over time, for all hospital sites and separately for two of my choice.

                I am yet to check for autocorrelation and partial autocorrelation, as I was still unsure if I can even run a time series analysis in the first place - given the issue I was running into with the 'mydate' variable, which, indeed, does not uniquely identify cases.

                Many thanks again and apologies for the incomplete info, I am new to STATA and still findingmy feet with it.

                Click image for larger version

Name:	data.png
Views:	1
Size:	22.1 KB
ID:	1717753

                Comment


                • #9
                  I don't think the discussion here is now about Stata [please note spelling]: it's about what you think time series analysis includes.

                  Usage may vary, but I am happy myself to include panel time series as time series.

                  I would look for trend and seasonality first, but it's your project.

                  Comment

                  Working...
                  X