Identifying the Distribution of Data

Vincenzo Coviello

Join Date: Jun 2020

Posts: 1
#1

Identifying the Distribution of Data

11 Jun 2020, 00:47

Hi All,

the COVID-19 epidemics proposed the need to estimate the distribtion of the serial interval, i.e. the time between the start of symptoms in the primary patient (infector) and onset of symptoms in the patient receiving that infection from the infector (the infectee).

We frequently read that it follows a gamma distribution.
In Puglia, a region of Italy, we have our data where we can identify in several cases the infector and the infectee. Therefore we would like to estimate the appropriate distribution of the serial interval in our region.
In practice, we have the sympoms onset date of the infector and the sympoms onset date of the infectee/s.

Can someone suggest how to identify the distribution starting from our data?

Thanks.
Enzo
Tags: None
Mattia Coppo

Join Date: Aug 2019

Posts: 33
#2

12 Jun 2020, 09:59

Hi Vincenzo,

Welcome to Statalist. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

Anyway, why don't you just calculate the difference between the two dates and then plot it using kdensity?

Mattia

Notwithstanding, that I am unable to see your data, I imagine that you can plot a kernel of the difference (in days) between
Comment
Enzo Coviello

Join Date: Apr 2014

Posts: 4
#3

15 Jun 2020, 05:34

Thanks Mattia,

for this first tip about the use of a kernel density plot

The Anderson-Darling test is used for testing if data in a variable came from a particular distribution like normal, uniform, lognormal, logistica exponential, Weibull, gamma etc.
In the case of serial intervals of COVID-19 cases I believe that gamma or lognormal distributions should be among the best candidates.
It seems that this test has some limited implementation in Stata, but sometimes resources, hints from Stata community are surprising.

Furthermore, it could be useful also some graph for checking if the selected distribution fits our data.

Did any other Stata User face the same task?

Best wishes
Enzo
Comment

Announcement

Identifying the Distribution of Data

Comment

Comment