how to choose the offset in my count regression?

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

how to choose the offset in my count regression?

05 Dec 2024, 12:50

Dear all,

I have a problem related to a dataset in which each row represents a hospitalization in a pediatric psychiatry.
For each patient (id) there is the sex, the year, the fact that the day of hospitalization was school or not, the month, the date and the day of the week (Monday, Tuesday and so on).
I would like to understand if the incidence increases over the years, with the fact that the day is school or not and the interaction of the two factors.

The data is organized like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte sex int admission_date float(idd season school month day)
2014 0 19731   1 3 1  1 3
2014 1 19730   2 3 1  1 2
2014 1 19737   3 3 1  1 2
2014 0 19738   4 3 1  1 3
2014 1 19740   5 3 1  1 5
2014 0 19744   6 3 1  1 2
2014 0 19744   7 3 1  1 2
2014 0 19746   8 3 1  1 4
2014 0 19756   9 3 0  2 0
2014 0 19757  10 3 1  2 1
2014 1 19759  11 3 1  2 3
2014 0 19764  12 3 1  2 1
2014 0 19768  13 3 1  2 5
2014 1 19770  14 3 0  2 0
2014 1 19773  15 3 1  2 3
2014 1 19775  16 3 1  2 5
2014 1 19779  17 3 1  2 2
2014 1 19781  18 3 1  2 4
2014 0 19781  19 3 1  2 4
2014 0 19782  20 3 1  2 5
2014 0 19784  21 3 0  3 0
2014 0 19785  22 3 0  3 1
2014 1 19786  23 3 0  3 2
2014 1 19786  24 3 0  3 2
2014 0 19791  25 3 0  3 0
2014 1 19791  26 3 0  3 0
2014 1 19792  27 3 1  3 1
2014 1 19794  28 3 1  3 3
2014 0 19800  29 3 1  3 2
2014 1 19801  30 3 1  3 3
2014 1 19803  31 0 1  3 5
2014 0 19805  32 0 0  3 0
2014 1 19806  33 0 1  3 1
2014 0 19809  34 0 1  3 4
2014 1 19813  35 0 1  3 1
2014 0 19814  36 0 1  4 2
2014 0 19815  37 0 1  4 3
2014 1 19817  38 0 1  4 5
2014 1 19817  39 0 1  4 5
2014 1 19821  40 0 0  4 2
2014 1 19823  41 0 0  4 4
2014 0 19823  42 0 0  4 4
2014 1 19824  43 0 0  4 5
2014 1 19824  44 0 0  4 5
2014 1 19832  45 0 0  4 6
2014 1 19841  46 0 1  4 1
2014 1 19842  47 0 1  4 2
2014 0 19848  48 0 1  5 1
2014 0 19851  49 0 1  5 4
2014 0 19851  50 0 1  5 4
2014 1 19852  51 0 1  5 5
2014 1 19854  52 0 0  5 0
2014 0 19855  53 0 1  5 1
2014 0 19856  54 0 1  5 2
2014 1 19857  55 0 1  5 3
2014 1 19857  56 0 1  5 3
2014 1 19858  57 0 1  5 4
2014 1 19859  58 0 1  5 5
2014 1 19859  59 0 1  5 5
2014 1 19866  60 0 1  5 5
2014 1 19868  61 0 0  5 0
2014 1 19877  62 0 1  6 2
2014 1 19880  63 0 1  6 5
2014 1 19884  64 0 1  6 2
2014 0 19889  65 0 0  6 0
2014 1 19914  66 1 0  7 4
2014 0 19918  67 1 0  7 1
2014 1 19920  68 1 0  7 3
2014 1 19926  69 1 0  7 2
2014 1 19928  70 1 0  7 4
2014 1 19932  71 1 0  7 1
2014 1 19942  72 1 0  8 4
2014 1 19948  73 1 0  8 3
2014 0 19954  74 1 0  8 2
2014 1 19957  75 1 0  8 5
2014 1 19964  76 1 0  8 5
2014 1 19965  77 1 0  8 6
2014 0 19974  78 1 1  9 1
2014 1 19975  79 1 1  9 2
2014 0 19975  80 1 1  9 2
2014 1 19975  81 1 1  9 2
2014 0 19976  82 1 1  9 3
2014 0 19985  83 1 1  9 5
2014 0 19987  84 1 0  9 0
2014 1 19988  85 1 1  9 1
2014 1 19989  86 2 1  9 2
2014 0 19991  87 2 1  9 4
2014 1 19991  88 2 1  9 4
2014 1 19994  89 2 0  9 0
2014 1 19995  90 2 1  9 1
2014 1 19995  91 2 1  9 1
2014 1 19996  92 2 1  9 2
2014 1 19997  93 2 1 10 3
2014 1 19997  94 2 1 10 3
2014 1 19998  95 2 1 10 4
2014 1 19999  96 2 1 10 5
2014 0 19999  97 2 1 10 5
2014 0 20005  98 2 1 10 4
2014 1 20007  99 2 0 10 6
2014 0 20009 100 2 1 10 1
end
format %td admission_date
label values season stal
label def stal 0 "Primavera", modify
label def stal 1 "Estate", modify
label def stal 2 "Autunno", modify
label def stal 3 "Inverno", modify
label values school scuolal
label def scuolal 0 "Non scolastico", modify
label def scuolal 1 "Scolastico", modify
label values month mesel
label def mesel 1 "Gennaio", modify
label def mesel 2 "Febbraio", modify
label def mesel 3 "Marzo", modify
label def mesel 4 "Aprile", modify
label def mesel 5 "Maggio", modify
label def mesel 6 "Giugno", modify
label def mesel 7 "Luglio", modify
label def mesel 8 "Agosto", modify
label def mesel 9 "Settembre", modify
label def mesel 10 "Ottobre", modify

My idea is to create a dataset in which the number of hospitalizations is stratified by year, month and school/non-school day, as suggested to me in a previous post.

Code:

collapse (count) admissions = idd , by(month school year)
poisson admissions i.month i.school i.year
poisson admissions i.month i.school##i.year

The problem I am asking myself is the non-school days are fewer than the school days, unlike the months that have a comparable exposure (about 30 days).

How can I fix this? How should I set the offset?

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29794
#2

05 Dec 2024, 13:15

You have chosen to aggregate the data by month. (By the way, you really should be doing this with a single month_year variable + a month variable to capture the monthly-seasonal variation.) Your unit of time is then the month.

Code:

gen mdate = mofd(admission_date) format mdate %tm collapse (first) month year (count) admissions = idd, by(mdate school)

So the exposure variable should be calculated differently for the school == 1 and school == 0 observations. For school == 1, it should be the number of school-days in the month. For school == 0 it should be the number of non-school-days in the month. As I have no idea what the school calendar in your study location is like, I'm in no position to actually calculate those exposures, but presumably you have that information and can get it into your data set either with some gen/replace commands or by -merge-ing with a data set having that information.

When you do that, your poisson regressions will give you incidence rate ratios that refer to incidence of admissions per day at risk. If that produces numbers that are inconveniently small, you could scale up to incidence of admissions per 100 days at risk by dividing the exposure variable by 100, or something like that.

Added: It just dawns on me that to get this right, you have to augment the -collapse-d data set so that it includes observations for school-month combinations that experience no admissions. So after, the -collapse- you need:

Code:

tsset school mdate tsfill, full replace year = year(dofm(mdate)) if missing(year) replace month = month(dofm(mdate)) if missing(month) replace admissions = 0 if missing(admissions)

Last edited by Clyde Schechter; 05 Dec 2024, 13:23.
Comment
Gianfranco Di Gennaro

Join Date: Nov 2020

Posts: 134
#3

05 Dec 2024, 13:46

Thank you, Clyde Schechter , as always.
In the reference population, school days are about 20 per month (Monday to Friday).
Does this mean that I need to create a variable called, for simplicity, "daysatrisk" and assign the value 20 for "school" and 10 for "non school"?
And then:

Code:

poisson admissions i.month i.school i.year, offset(daysatrisk)

Is this correct?

I have one last question: the fact that there are combinations of factors where the count is zero (e.g., in the data I posted, there are no admissions for non-school, month=1, year=2014).
Could the absence of a cell with count=0 create bias in the estimates? If I remember correctly, you discussed this in a post some time ago.
In another post, I was suggested to use the "contract" command, which seems to work perfectly in this regard.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29794
#4

05 Dec 2024, 14:42

Yes, you have the gist of it. Now, where I live, there are certain months where there are no school days, such as July, and other months that have extended breaks (like a week off) that would reduce the schoolday and increase the non-schoolday value of daysatrisk. There may be adjustments like that which you need to make as well.

The omission of a cell with count = 0 definitely creates bias in the estimates. Suppose we just had four months to consider (let's ignore school vs nonschool days and other such complications), and the number of admissions in those months were 0, 3, 1, and 0. Then the correct incidence rate would be (0+3+1+0)/4 = 1 admission per month. BUt if the months with 0 observations are excluded, you calculate (3+1)/2 = 2 admissions per month, a 100% overestimate.

For counting up the number of admissions here, -collapse- and -contract- work equally well. But -contract- cannot give you any statistics other than counts or percents. -collapse- is more versatile, though you may or may not need the additional versatility.
Comment

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

05 Dec 2024, 17:53

Thanks Clyde.
As for the winter and summer vacation periods, for example July and August, in my opinion, the only thing to do is to eliminate the whole months from the dataset.

Instead I'm having problems with the code.

If I wanted to stratify by more variables in addition to school and month also sex, first hospitalization vs. relapse, and also calculate the average age I would do:

Code:

gen mdate = mofd(admission_date)
format mdate %tm
collapse (first) month year (count) admissions = idd (mean) age , by(mdate school sex first_admission)

What's the difference with

Code:

collapse (count) admissions = idd (mean) age, by(mdate school sex first_admission month year)

?

Moreover, to expand the collapse with empty cells, tsset would be impossible to use because I would have multiple panelvars.
The only thing to do would be a merge of "collapse" and "contract", correct?

One last question. In the count regression how would you model time?
Would you use mdate as a covariate?
That is:

Code:

poisson admissions school sex first_admission age....mdate, offset...

Or year and month crossed? SOmething like:

Code:

meglm admissions i.sex i.year i.month i.first_admission i.school c.age  || year: || month:, family(poisson) link(log) irr

Using mdate for such a long period of time is almost certainly non-linear and would require me to make a spline (or any other non-linear model) model to capture seasonality. Is this correct?

Thank you very much for your time.

P.S. initial data would be like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year double age byte sex int admission_date byte first_admission float(school idd month)
2014  13.95890410958904 0 19731 1 1   1  1
2014 15.246575342465754 1 19730 0 1   2  1
2014 17.295890410958904 1 19737 1 1   3  1
2014 17.706849315068492 0 19738 0 1   4  1
2014 16.301369863013697 1 19740 0 1   5  1
2014  14.92876712328767 0 19744 0 1   6  1
2014 12.057534246575342 0 19744 1 1   7  1
2014 15.764383561643836 0 19746 0 1   8  1
2014 17.756164383561643 0 19756 1 0   9  2
2014 15.663013698630136 0 19757 1 1  10  2
2014 14.635616438356164 1 19759 0 1  11  2
2014 12.898630136986302 0 19764 1 1  12  2
2014 16.454794520547946 0 19768 0 1  13  2
2014 18.934246575342467 1 19770 0 0  14  2
2014 17.405479452054795 1 19773 1 1  15  2
2014 17.720547945205478 1 19775 0 1  16  2
2014  14.56986301369863 1 19779 0 1  17  2
2014 16.438356164383563 1 19781 1 1  18  2
2014 15.728767123287671 0 19781 1 1  19  2
2014 16.389041095890413 0 19782 1 1  20  2
2014 14.715068493150685 0 19784 0 0  21  3
2014 16.915068493150685 0 19785 1 0  22  3
2014  17.70958904109589 1 19786 0 0  23  3
2014 15.027397260273972 1 19786 1 0  24  3
2014  18.92054794520548 0 19791 0 0  25  3
2014  17.90958904109589 1 19791 0 0  26  3
2014 15.764383561643836 1 19792 0 1  27  3
2014  15.04931506849315 1 19794 1 1  28  3
2014 12.210958904109589 0 19800 1 1  29  3
2014 15.827397260273973 1 19801 1 1  30  3
2014 17.476712328767125 1 19803 1 1  31  3
2014  16.45205479452055 0 19805 1 0  32  3
2014 12.841095890410958 1 19806 0 1  33  3
2014 14.783561643835617 0 19809 1 1  34  3
2014 17.504109589041096 1 19813 1 1  35  3
2014 16.994520547945207 0 19814 1 1  36  4
2014 14.668493150684931 0 19815 0 1  37  4
2014 15.871232876712329 1 19817 1 1  38  4
2014 16.156164383561645 1 19817 1 1  39  4
2014 16.567123287671233 1 19821 0 0  40  4
2014 12.887671232876713 1 19823 1 0  41  4
2014 12.273972602739725 0 19823 1 0  42  4
2014 16.575342465753426 1 19824 1 0  43  4
2014 13.487671232876712 1 19824 0 0  44  4
2014 12.912328767123288 1 19832 1 0  45  4
2014 12.936986301369863 1 19841 1 1  46  4
2014 17.583561643835615 1 19842 1 1  47  4
2014 16.328767123287673 0 19848 0 1  48  5
2014 18.375342465753423 0 19851 0 1  49  5
2014  10.73972602739726 0 19851 0 1  50  5
2014 16.252054794520546 1 19852 1 1  51  5
2014 14.849315068493151 1 19854 0 0  52  5
2014 15.336986301369864 0 19855 1 1  53  5
2014 14.575342465753424 0 19856 0 1  54  5
2014 13.843835616438357 1 19857 0 1  55  5
2014 14.353424657534246 1 19857 0 1  56  5
2014 16.268493150684932 1 19858 1 1  57  5
2014 16.145205479452056 1 19859 0 1  58  5
2014 14.794520547945206 1 19859 0 1  59  5
2014  17.09041095890411 1 19866 0 1  60  5
2014 15.391780821917807 1 19868 1 0  61  5
2014 13.035616438356165 1 19877 1 1  62  6
2014 15.202739726027398 1 19880 0 1  63  6
2014 13.054794520547945 1 19884 1 1  64  6
2014 17.964383561643835 0 19889 0 0  65  6
2014 13.136986301369863 1 19914 1 0  66  7
2014 15.942465753424658 0 19918 1 0  67  7
2014 14.783561643835617 1 19920 0 0  68  7
2014 14.972602739726028 1 19926 1 0  69  7
2014 16.027397260273972 1 19928 0 0  70  7
2014 17.756164383561643 1 19932 0 0  71  7
2014 14.558904109589042 1 19942 0 0  72  8
2014 15.032876712328767 1 19948 1 0  73  8
2014 15.608219178082193 0 19954 1 0  74  8
2014 15.495890410958904 1 19957 1 0  75  8
2014  18.67945205479452 1 19964 1 0  76  8
2014 17.638356164383563 1 19965 0 0  77  8
2014 11.076712328767123 0 19974 1 1  78  9
2014 15.832876712328767 1 19975 0 1  79  9
2014 13.997260273972604 0 19975 1 1  80  9
2014 15.106849315068493 1 19975 1 1  81  9
2014  16.56986301369863 0 19976 0 1  82  9
2014 14.536986301369863 0 19985 0 1  83  9
2014 14.934246575342465 0 19987 1 0  84  9
2014 17.994520547945207 1 19988 1 1  85  9
2014 13.342465753424657 1 19989 1 1  86  9
2014 15.923287671232877 0 19991 0 1  87  9
2014 13.257534246575343 1 19991 1 1  88  9
2014 15.156164383561643 1 19994 0 0  89  9
2014  16.90958904109589 1 19995 0 1  90  9
2014  13.35890410958904 1 19995 1 1  91  9
2014 18.005479452054793 1 19996 1 1  92  9
2014 13.364383561643836 1 19997 1 1  93 10
2014 17.934246575342467 1 19997 1 1  94 10
2014 15.608219178082193 1 19998 1 1  95 10
2014 16.534246575342465 1 19999 0 1  96 10
2014 16.326027397260273 0 19999 1 1  97 10
2014  16.34246575342466 0 20005 1 1  98 10
2014 13.391780821917807 1 20007 1 0  99 10
2014 14.457534246575342 0 20009 0 1 100 10
end
format %td admission_date
label values school scuolal
label def scuolal 0 "Non scolastico", modify
label def scuolal 1 "Scolastico", modify
label values month mesel
label def mesel 1 "Gennaio", modify
label def mesel 2 "Febbraio", modify
label def mesel 3 "Marzo", modify
label def mesel 4 "Aprile", modify
label def mesel 5 "Maggio", modify
label def mesel 6 "Giugno", modify
label def mesel 7 "Luglio", modify
label def mesel 8 "Agosto", modify
label def mesel 9 "Settembre", modify
label def mesel 10 "Ottobre", modify

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29794
#6

05 Dec 2024, 18:38

Code:

gen mdate = mofd(admission_date)
format mdate %tm
collapse (first) month year (count) admissions = idd (mean) age , by(mdate school sex first_admission)

What's the difference with
Code:

collapse (count) admissions = idd (mean) age, by(mdate school sex first_admission month year)

?

No difference. They produce the same thing. I think about them differently, and prefer the first, because it shows that I really do want to create separate observations per mdate, whereas the associated month and year "come along for the ride," but there is no actual difference in what they do.

Moreover, to expand the collapse with empty cells, tsset would be impossible to use because I would have multiple panelvars.
The only thing to do would be a merge of "collapse" and "contract", correct?

No, you can still do -tsset-. You just need to create a different panel variable. And you have to do some additional filling in of variables. It would look like this:

Code:

gen mdate = mofd(admission_date) format mdate %tm collapse (first) month year (count) admissions = idd (mean) age , by(mdate school sex first_admission) egen panel = group(school sex first_admission) tsset panel mdate tsfill, full foreach v of varlist school sex first_admission { by panel (`v'), sort: replace `v' = `v'[1] } replace year = year(dofm(mdate)) if missing(year) replace month = month(dofm(mdate)) if missing(month) replace admissions = 0 if missing(admissions)

There is one serious problem, however. That's the variable age. It makes no sense in this context. The variable age is, at this point, the mean of the original variable age. But that original variable age is the age of the individual person admitted. You can't use that as a predictor of admissions! It's downstream of the outcome variable! What you might use, if you can get it, is the mean age of the entire cohort of students, admitted or otherwise, for each sex in each month. That's not a consequence of the admission outcome.

One last question. In the count regression how would you model time?
Would you use mdate as a covariate?
That is: Code:

poisson admissions school sex first_admission age....mdate, offset...

Or year and month crossed? SOmething like:
Code:

meglm admissions i.sex i.year i.month i.first_admission i.school c.age || year: || month:, family(poisson) link(log) irr

Using mdate for such a long period of time is almost certainly non-linear and would require me to make a spline (or any other non-linear model) model to capture seasonality. Is this correct?

I definitely would not do a model with year and month as random effects while also including them as fixed effects in the same -meglm- command. If it even converged at all, it would be uninterpretable. I would probably start by including i.mdate in the -poisson- regression. Then I would examine the coefficients, perhaps graphing them against mdate itself, to see what the effect of time looks like: is there a trend? Is there seasonal cycling? Is there seasonal cycling superimposed on a trend? Is there nothing apparently regular? Then depending on what I find I would try to represent that as simply as possible. A spline might turn out to be the simplest, particularly if there is no apparent regularity or some apparently curvilinear trend that isn't reasonably well captured with a quadratic term or something like that. But a combination of c.mdate and some period function of mdate might do the trick if you have a trend with superimposed cycling.
Comment
Gianfranco Di Gennaro

Join Date: Nov 2020

Posts: 134
#7

06 Dec 2024, 04:42

Dear Clyde Schechter , thank you very much.
I understand the reasoning about age.
Unfortunately, I only have the individual ages of hospitalized patients, not of non-hospitalized patients.
At this point, I guess I should limit myself to evaluating age only in descriptive statistics.
Comment

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

06 Dec 2024, 09:38

Dear Clyde Schechter ,

I have one last question for you.
This has always been somewhat unclear to me when working with count data.

I created my exposure variable ("daysatrisk"). It has a value of 20 when school == 1 and a value of 10 when school == 0.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(sex first_admission) float(school mdate month) int year long admissions double age float(panel daysatrisk)
1 1 1 654  7 2014 0                  . 8 20
1 0 1 657 10 2014 1 16.534246575342465 7 20
0 1 0 655  8 2014 1 15.608219178082193 2 10
1 1 1 652  5 2014 2  16.26027397260274 8 20
1 0 1 656  9 2014 2  16.37123287671233 7 20
0 0 0 650  3 2014 2 16.817808219178083 1 10
0 0 1 648  1 2014 3 16.133333333333333 5 20
1 1 1 651  4 2014 4 15.636986301369863 8 20
0 0 0 659 12 2014 0                  . 1 10
0 1 1 655  8 2014 0                  . 6 20
0 0 0 654  7 2014 0                  . 1 10
1 0 1 652  5 2014 5 15.245479452054791 7 20
1 1 1 658 11 2014 4  17.20342465753425 8 20
1 1 0 656  9 2014 0                  . 4 10
0 1 1 651  4 2014 1 16.994520547945207 6 20
0 0 1 656  9 2014 3 15.676712328767124 5 20
0 0 1 652  5 2014 4 15.004794520547945 5 20
1 0 0 648  1 2014 0                  . 3 10
0 0 0 655  8 2014 0                  . 1 10
1 1 0 655  8 2014 3 16.402739726027395 4 10
end
format %tm mdate
label values school scuolal
label def scuolal 0 "Non scolastico", modify
label def scuolal 1 "Scolastico", modify

I ran one of my first Poisson regressions (I hope it is correctly specified; I retained only school months and removed December 2020 for Covid and so on):

Code:

. poisson admissions i.year i.school i.first_admission i.sex if (month==2 | month==3 | month==4 | month==5 |
>  month==10 | month==11) & year!=2020, irr exposure( daysatrisk) vce(robust)

Iteration 0:   log pseudolikelihood = -541.06908  
Iteration 1:   log pseudolikelihood = -541.06204  
Iteration 2:   log pseudolikelihood = -541.06204  

Poisson regression                                      Number of obs =    378
                                                        Wald chi2(10) = 117.57
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -541.06204                       Pseudo R2     = 0.1116

-----------------------------------------------------------------------------------
                  |               Robust
       admissions |        IRR   std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
             year |
            2015  |   1.012821   .1776852     0.07   0.942     .7181267    1.428446
            2016  |   .9514145   .1580693    -0.30   0.764     .6869892    1.317618
            2017  |   .8333333   .1341706    -1.13   0.257     .6078148    1.142526
            2018  |   .7692308   .1109034    -1.82   0.069      .579876    1.020418
            2019  |          1   .1452289    -0.00   1.000     .7522825    1.329288
            2021  |   1.192308   .2030508     1.03   0.302     .8539416    1.664748
            2022  |   1.346154   .1911857     2.09   0.036      1.01907     1.77822
                  |
           school |
      Scolastico  |   2.022925   .2079575     6.85   0.000     1.653773    2.474478
1.first_admission |   1.116643   .0896441     1.37   0.169     .9540696     1.30692
            1.sex |   1.888372   .1620041     7.41   0.000      1.59611    2.234151
            _cons |   .0421413   .0070757   -18.86   0.000     .0303241    .0585637
   ln(daysatrisk) |          1  (exposure)
-----------------------------------------------------------------------------------
Note: _cons estimates baseline incidence rate.

The problem is with interpreting the margins.
For instance, if I wanted to know the predicted number of events per year, in my logic, I should run:

Code:

. margins year, predict(n)

Predictive margins                                         Number of obs = 378
Model VCE: Robust

Expression: Predicted number of events, predict(n)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2014  |   1.619622   .1783441     9.08   0.000     1.270074     1.96917
       2015  |   1.640387   .2240687     7.32   0.000      1.20122    2.079553
       2016  |   1.540932   .1915084     8.05   0.000     1.165582    1.916282
       2017  |   1.349685   .1585434     8.51   0.000     1.038946    1.660425
       2018  |   1.245863   .1159733    10.74   0.000      1.01856    1.473167
       2019  |   1.619622   .1533886    10.56   0.000     1.318986    1.920258
       2021  |   1.931088   .2508541     7.70   0.000     1.439423    2.422753
       2022  |   2.180261   .1955942    11.15   0.000     1.796903    2.563618
------------------------------------------------------------------------------

However, the values I get are all slightly above 1. I'm wondering how to interpret them. Are they the number of events per daysatrisk? I don't think so. What is the truth? My aim is to present the number of events per month, or something other directly interpretable.

Thank you very much!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29794
#9

06 Dec 2024, 10:49

You are using the -predict()- option of -margins- incorrectly. -predict(n)- gives you the average predicted total number of admissions per month in each year. Since that averages over both school days and non-school days, which have different numbers of days at risk. In other words, if there are on average 5 admissions per month on school days and 3 admissions per month on non-school days, you will get their average, 4, out of this -margins- command, not the total 8. As your regression also subdivides the data by sex and first admission, you are going to get averages of even smaller numbers.

I don't see a clear way to get what you are looking for out of -margins- with this regression. I think to get the predicted total number of admissions per month, you need to redo the regression on a different data set with just one observation per month. And if you want to treat all the months as being the same number of days (counterfactual, but close) just don't bother with an -exposure()- option in the regression.

Last edited by Clyde Schechter; 06 Dec 2024, 11:00.
Comment
Gianfranco Di Gennaro

Join Date: Nov 2020

Posts: 134
#10

06 Dec 2024, 16:53

Dear Clyde Schechter , thank you.
I will try to do as you suggest.
Thanks again.
Gianfranco
Comment

Announcement

how to choose the offset in my count regression?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment