Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regression with year as predictor variable

    Hi all,

    I am new to Stata and a somewhat of statistics novice. I am working on a project where I am replicating the methodology of a study that analyzed trends in the number of emergency department (ED) visits over time. I am using the same dataset, in which each observation is a unique patient visit record (i.e., 1 row = 1 visit). In the original study, linear regression was used to assess the statistical significance in trends in the number of ED visits over the study period (1990-2009). With my project, I need to do the same, but for years 2010-2017.

    I have been at a loss of how to do this in stata, as typically with linear regression you would declare both an independent and dependent variable (e.g., height and weight). I know that my independent variable would be Year, but how would I declare my dependent variable? The dependent variable would be the number of cases per year, which is actually a frequency and not a specific variable that I have set up.

    The only way I could think to do this was to run a frequency on year, and then create a new dataset with year as the independent variable and count and the dependent variable:

    svy: tab Year
    Year Count
    2010 5269
    2011 5902
    2012 6793
    2013 7212
    2014 7094
    2015 5070
    2016 9186
    2017 10586

    Then, with the new dataset:


    Click image for larger version

Name:	Untitled.png
Views:	1
Size:	54.1 KB
ID:	1622440


    Is this totally the wrong way to do this? Is there a way to do this from within my original dataset? I feel like it should be simple, but I've had researched extensively and can't figure it out.
    Attached Files

  • #2
    Your way of doing it is fine. Here's another approach, which should get you the same results and might be simpler, as you don't say how you went about creating a new data set from the output of -svy:tab-.

    Since you used -svy: tab- instead of simple tab, I'm assuming you had a sample that was not a simple random sample. You didn't show the -svyset- command, but I'll assume that there was a -pweight- involved in it, and that that variable is called wt.

    Code:
    gen long obs_no = _n
    collapse (count) visit_count = obs_no [pweight = wt], by(Year)
    regress visit_count Year
    That said, your model assumes linear year-on-year growth. And that may well be appropriate, although in many contexts one expects more of a constant growth rate. In that case, instead of using linear regression, a Poisson regression would be more suitable:
    Code:
    poisson visit_count Year, irr
    where the IRR for Year in the output will represent the year-on-year growth rate.

    Comment


    • #3
      I would add a small but I think worthwhile twist to Clyde Schechter 's excellent advice. Fit in terms of some convenient origin, say 2015. Then the intercept is sensibly a fitted value for 2015, not a fitted value for year 0.

      The code below further gives the essence of plotting such a fit.

      More if you want it at https://www.stata-journal.com/articl...article=st0394



      CODE]
      . poisson Count Year, irr

      Iteration 0: log likelihood = -783.44705
      Iteration 1: log likelihood = -783.44705

      Poisson regression Number of obs = 8
      LR chi2(1) = 1956.23
      Prob > chi2 = 0.0000
      Log likelihood = -783.44705 Pseudo R2 = 0.5553

      ------------------------------------------------------------------------------
      Count | IRR Std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      Year | 1.084593 .0020019 43.99 0.000 1.080676 1.088524
      _cons | 6.87e-68 2.55e-67 -41.60 0.000 4.70e-71 1.00e-64
      ------------------------------------------------------------------------------
      Note: _cons estimates baseline incidence rate.

      . gen Year2 = Year - 2015

      . poisson Count Year2, irr

      Iteration 0: log likelihood = -783.44705
      Iteration 1: log likelihood = -783.44705

      Poisson regression Number of obs = 8
      LR chi2(1) = 1956.23
      Prob > chi2 = 0.0000
      Log likelihood = -783.44705 Pseudo R2 = 0.5553

      ------------------------------------------------------------------------------
      Count | IRR Std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      Year2 | 1.084593 .0020019 43.99 0.000 1.080676 1.088524
      _cons | 7925.863 36.71566 1938.07 0.000 7854.227 7998.152
      ------------------------------------------------------------------------------
      Note: _cons estimates baseline incidence rate.

      . predict COUNT
      (option n assumed; predicted number of events)

      . line Count Year || mspline COUNT Year

      . line Count Year || mspline COUNT Year, bands(20)

      [/CODE]

      Comment

      Working...
      X