Hi Statlist,
This is my first time posting, so apologies for any difficulties.
I am having a lot of difficulty with converging my MLM models from a large dataset. The data consists of 9 years of data, and multiple observations of the same person within each year. The multiple observations per person within year are billing codes. After I collapse my data by NPI and year I get a dataset with 926,037 observations, so one observation per person per year.
I am trying to run some base models on my data, but many of them are taking a very long time to run and usually do not converge. My models have successfully converged only twice when I use the sample command, I am sampling it down to 20%. The code and results below took about 4.5 hours to run. When I have tried to run the same model on the full dataset, I have convergence issues. I have already simplified my outcome from a 4-item categorical response to a binary response. When I have tried using gsem to run a multilevel multinomial model my model does not converge even when I reduce the sample size to 20%.
I am starting with predicting using year, my time variable, which I am treating continuously. I have time nested within NPI number.
Are there any suggestions for doing longitudinal MLM on large datasets? I am using Stata MP V18. Thank you for your help and please let me know if there are any additional details that would be helpful!
Example code and result after 77 iterations:
preserve
set seed 154
sample 20
melogit rucabinary c.year || NPI: , cov(un)
restore\
Mixed-effects logistic regression Number of obs = 181,514
Group variable: NPI Number of groups = 116,980
Obs per group:
min = 1
avg = 1.6
max = 7
Integration method: ghermite Integration pts. = 7
Wald chi2(1) = 8.52
Log likelihood = -20844.722 Prob > chi2 = 0.0035
rucabinary Coefficient Std. err. z P>z [95% conf. interval]
year .0786536 .0269403 2.92 0.004 .0258516 .1314556
_cons -102.2373 2.220698 -46.04 0.000 -106.5898 -97.88486
NPI
var(_cons) 15045.04 669.3704 13788.67 16415.88
LR test vs. logistic model: chibar2(01) = 21072.33 Prob >= chibar2 = 0.0000
Note: The above coefficient values are the result of non-adaptive quadrature
because the adaptive parameters could not be computed.
This is my first time posting, so apologies for any difficulties.
I am having a lot of difficulty with converging my MLM models from a large dataset. The data consists of 9 years of data, and multiple observations of the same person within each year. The multiple observations per person within year are billing codes. After I collapse my data by NPI and year I get a dataset with 926,037 observations, so one observation per person per year.
I am trying to run some base models on my data, but many of them are taking a very long time to run and usually do not converge. My models have successfully converged only twice when I use the sample command, I am sampling it down to 20%. The code and results below took about 4.5 hours to run. When I have tried to run the same model on the full dataset, I have convergence issues. I have already simplified my outcome from a 4-item categorical response to a binary response. When I have tried using gsem to run a multilevel multinomial model my model does not converge even when I reduce the sample size to 20%.
I am starting with predicting using year, my time variable, which I am treating continuously. I have time nested within NPI number.
Are there any suggestions for doing longitudinal MLM on large datasets? I am using Stata MP V18. Thank you for your help and please let me know if there are any additional details that would be helpful!
Example code and result after 77 iterations:
preserve
set seed 154
sample 20
melogit rucabinary c.year || NPI: , cov(un)
restore\
Mixed-effects logistic regression Number of obs = 181,514
Group variable: NPI Number of groups = 116,980
Obs per group:
min = 1
avg = 1.6
max = 7
Integration method: ghermite Integration pts. = 7
Wald chi2(1) = 8.52
Log likelihood = -20844.722 Prob > chi2 = 0.0035
rucabinary Coefficient Std. err. z P>z [95% conf. interval]
year .0786536 .0269403 2.92 0.004 .0258516 .1314556
_cons -102.2373 2.220698 -46.04 0.000 -106.5898 -97.88486
NPI
var(_cons) 15045.04 669.3704 13788.67 16415.88
LR test vs. logistic model: chibar2(01) = 21072.33 Prob >= chibar2 = 0.0000
Note: The above coefficient values are the result of non-adaptive quadrature
because the adaptive parameters could not be computed.
Comment