Linear regression with year as predictor variable

Mark Chandler

Join Date: Jul 2020

Posts: 3
#1

Linear regression with year as predictor variable

08 Aug 2021, 14:41

Hi all,

I am new to Stata and a somewhat of statistics novice. I am working on a project where I am replicating the methodology of a study that analyzed trends in the number of emergency department (ED) visits over time. I am using the same dataset, in which each observation is a unique patient visit record (i.e., 1 row = 1 visit). In the original study, linear regression was used to assess the statistical significance in trends in the number of ED visits over the study period (1990-2009). With my project, I need to do the same, but for years 2010-2017.

I have been at a loss of how to do this in stata, as typically with linear regression you would declare both an independent and dependent variable (e.g., height and weight). I know that my independent variable would be Year, but how would I declare my dependent variable? The dependent variable would be the number of cases per year, which is actually a frequency and not a specific variable that I have set up.

The only way I could think to do this was to run a frequency on year, and then create a new dataset with year as the independent variable and count and the dependent variable:

svy: tab Year

Year Count

2010 5269

2011 5902

2012 6793

2013 7212

2014 7094

2015 5070

2016 9186

2017 10586

Then, with the new dataset:

Is this totally the wrong way to do this? Is there a way to do this from within my original dataset? I feel like it should be simple, but I've had researched extensively and can't figure it out.

Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

08 Aug 2021, 15:04

Your way of doing it is fine. Here's another approach, which should get you the same results and might be simpler, as you don't say how you went about creating a new data set from the output of -svy:tab-.

Since you used -svy: tab- instead of simple tab, I'm assuming you had a sample that was not a simple random sample. You didn't show the -svyset- command, but I'll assume that there was a -pweight- involved in it, and that that variable is called wt.

Code:

gen long obs_no = _n collapse (count) visit_count = obs_no [pweight = wt], by(Year) regress visit_count Year

That said, your model assumes linear year-on-year growth. And that may well be appropriate, although in many contexts one expects more of a constant growth rate. In that case, instead of using linear regression, a Poisson regression would be more suitable:

Code:

poisson visit_count Year, irr

where the IRR for Year in the output will represent the year-on-year growth rate.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

08 Aug 2021, 15:25

I would add a small but I think worthwhile twist to Clyde Schechter 's excellent advice. Fit in terms of some convenient origin, say 2015. Then the intercept is sensibly a fitted value for 2015, not a fitted value for year 0.

The code below further gives the essence of plotting such a fit.

More if you want it at https://www.stata-journal.com/articl...article=st0394

CODE]
. poisson Count Year, irr

Iteration 0: log likelihood = -783.44705
Iteration 1: log likelihood = -783.44705

Poisson regression Number of obs = 8
LR chi2(1) = 1956.23
Prob > chi2 = 0.0000
Log likelihood = -783.44705 Pseudo R2 = 0.5553

------------------------------------------------------------------------------
Count | IRR Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Year | 1.084593 .0020019 43.99 0.000 1.080676 1.088524
_cons | 6.87e-68 2.55e-67 -41.60 0.000 4.70e-71 1.00e-64
------------------------------------------------------------------------------
Note: _cons estimates baseline incidence rate.

. gen Year2 = Year - 2015

. poisson Count Year2, irr

Iteration 0: log likelihood = -783.44705
Iteration 1: log likelihood = -783.44705

Poisson regression Number of obs = 8
LR chi2(1) = 1956.23
Prob > chi2 = 0.0000
Log likelihood = -783.44705 Pseudo R2 = 0.5553

------------------------------------------------------------------------------
Count | IRR Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Year2 | 1.084593 .0020019 43.99 0.000 1.080676 1.088524
_cons | 7925.863 36.71566 1938.07 0.000 7854.227 7998.152
------------------------------------------------------------------------------
Note: _cons estimates baseline incidence rate.

. predict COUNT
(option n assumed; predicted number of events)

. line Count Year || mspline COUNT Year

. line Count Year || mspline COUNT Year, bands(20)

[/CODE]
Comment

Year	Count
2010	5269
2011	5902
2012	6793
2013	7212
2014	7094
2015	5070
2016	9186
2017	10586

Announcement

Linear regression with year as predictor variable

Comment

Comment