Multi-Level model - how to define it in STATA?

Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#16

06 Jul 2018, 05:46

Hi everyone!

I would like to check with you 2 questions I have still about this project. I am using Fixed Effects and Random Effects to estimate the hospital-specific effect (hospital effect_j). For Fixed Effects I am using Least Square Dummy Variable Analysis and for RE I am planning to use -glm- and -mixed- commands because I want to use both techniques: maximum likelihood and Least-Squares Optimization and -mixed- only uses maximum likelihood, so I will use -glm- for Least-Squares Optimization. I am not using xtset or xtreg because I only have 2 waves: baseline and 6months, as mentioned before, and I have 2 different variables for each timeframe: one for baseline and one for 6months, the data is not long because my model requires both variables in the regression:
quality_of_life_6months_ij = controls_ij + quality_of_life_0months_ij + hospital effect_j + error term_ij

1) I am wondering if this approach seems ok to you? My objective is to compare results from all these techniques.

2) can I use glm if I have a 2level model? Because the -mixed- I can specify 2 levels, but in -glm- I cannot. Can you please help?

Thank you
Oliveira
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#17

07 Jul 2018, 05:06

Sofia:
some comments abouy your query:
1) -fe- specification is always consistent, but it is less efficient if the -re- specification fits your data better. Hence, I would select which specification to use via -hausman- or the user-written programme -xtoverid-, if non-default standard errors are imposed. For both tests the null is that -re- is the way to go, even though their outcome is the offspring of two different approaches (see -help hausman- and -help xtoverid-, if the latter is installed);
2) having two waves of data only does not imply that you cannot use -xtreg- with -fe- or -re- specification;
3) I've never seen -glm- in panel data regression. Hence, I cannot advise on that;
4) going -mixed- implies a -re- specification (at least in one part of the model), as -mixed- is a different (and more flexible) way of typing -xtreg, mle re-;
5) I would check whether your regression model suffers from endogeneity, as some unobserved predictors embedded in residuals can be correlated with both quality of life assessments (that is, with the regressand and one of the regressor).

Kind regards,
Carlo
(Stata 19.0)
Comment
Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#18

08 Jul 2018, 07:42

Hi Carlo

Thanks again for all your help on this. As I go through the project I learn and understand more. Sorry if I keep asking but your help have been very useful to me. Thanks for the patience and willing to help.

All you say makes sense. Given my dataset, I am actually going to consider the number of drugs as temporal and might use that as a panel time variable. Because I have pre- and post-treatment quality of life, then each patient also gets more than one drug throughout their treatment, and then each patient is clustered in hospital. It is a complicated dataset. I have more observations than patients because each patient can have more than 1 drug. My dataset is long format based on number of drugs. Some patients have taken for example 2 drugs, so each of them will have 2 observations/rows in the dataset, each one for each drug and the patient_ID will be repeated. As mentioned before I want to use both Fixed Effects and Random Effects to estimate the hospital-specific effect. And for RE I will do the following in Stata:

mixed quality_of_life_6months $controls quality_of_life_0months i.drug_number || hospital: || patient_ID:

This way, Stata differentiates the number of hospitals and uniquely identify the number of patients.

For FE I thought about doing dummy variable analysis, something like this:

reg quality_of_life_6months $controls quality_of_life_0months i.drug_number i.Hospital, cluster(Hospital)
OR
xtset Hospital
xtreg quality_of_life_6months $controls quality_of_life_0months i.drug_number, fe cluster(Hospital)

But, my problem is the following: as I have repeated observations for the same patient (due to the fact that they take more than 1 drug) this FE regression model does not take that into account and it treats all observations as different patients which is not true. I cannot do xtset Hospital drug_number because Stata tells me drug_number is repeated for each Hospital and it is correct. How could I run a model using FE and taking that into account?

Thank you very much again!
Oliveira
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#19

08 Jul 2018, 08:11

Sofia:
OLS rarely outperform -xtreg- when it comes to panel data regression.
Left aside for a while the cumbersome structure of your dataset, another issue I'm not clear with is whether your analysis considers patients or hospitals as - panelid-.
I think that this point should be made clearer, especially if you plan to submit a manuscript about your reserach to a target journal in your research field. My opinion is that you should focus on patients nested within hospitals via -mixed-.
Besides, while you're right with clustering standard errors on -panelid- under OLS, as you do not have independent observations due to the panel structure of your data, robust/cluster standard errors under -xtreg- accomodates for both heteroskedasticity and/or autocorrelation. You should check whether that is the case with your data.
Panel model, no matter their -fe- or -re- specification, are conceived to host more than one wave of data for panel units. Probably, Stata complains that you have repaeated time-values for the same observation, as patients (that now seems to be the -panelid-) can take more than one drug during the same hospitalization event. Again, the entire issue boils down to -panelid- definition: patients or hospitals?
Eventually, you do not seem to address the endogeneity issue that can well be lurking in residuals.
As all these things are relevant for your research project to be successful, I would spend some time in discussing about them with your supervisor/teacher/professor.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#20

08 Jul 2018, 08:27

Thank you again for your quick and useful reply.

As panelid I am considering hospitals because I am trying to estimate the hospital-specific effect, not the individual-specific effect.

"Eventually, you do not seem to address the endogeneity issue that can well be lurking in residuals." - what do you mean? I am controlling for many patient characteristics (quality of life at baseline, age, gender, severity of disease, duration of disease, marital status, job status, educational level, comorbidities, among others) and hospital characteristics (volume, private/public), and my objective is to explain the variation in quality of life that cannot be explained by all these characteristics and that might be related to the performance of each hospital. Do you see problems with this?

Unfortunately my supervisor do not have experience with this type of analysis and that is why I am posting all these questions here while trying to figure out on my own. I am sorry to insist.

Thanks again
Oliveira
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#21

08 Jul 2018, 10:21

Sofia:
thanks for providing further details.
No probelm to keep on discussing about this thread, but if you are interested in investigating the portion of variance explained by -i.hospital- why not focusing on -mixed- only?
Besides, as far as endogeneity is concerned, two comments:
- if you're confident that you have consodered all the predictors needed to give a fair and true view of the data generating process, no endogeneity issue will probably creep up. However, it's worthy highlight that in their valuable https://www.stata.com/bookstore/heal...s-using-stata/, Authors state that endogeneity can be due to unobserved health status (page 201). If you use -fe- specifcation instead of -mixed- (as you considered in your previous posts), unobserved heterogeneity in time-invariant predictors will be wiped out; unfortunately, this magic won't hold for heterogeneity related to time varying predictors;
- you say that you have actually controlled (or better adjusted) for many patients' characteristics at the baseline; if I were you, I would ask myself again if the -panelid- is -patient- or -hospital- and I would probably answer that I cannot have hospitals' quality of life as regresand or regressor and, in all likelihood, I would go -mixed-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#22

09 Jul 2018, 05:32

Hi Carlo

Thank you very much for all your help again!

When you ask "but if you are interested in investigating the portion of variance explained by -i.hospital- why not focusing on -mixed- only?" Because precisely of what you mentioned below, the fact that with RE I might have unobserved heterogeneity in time-invariant factors not being considered in the model (and so biased results). I am aware that RE has 2 strong assumptions: 1) the hospital effect_jhas mean=0 and constant variance and 2) there is no correlation between regressors and the hospital-level unobserved time-invariant factors, which might not be the case. I am still going to run hausman test, but I am aware that RE is more efficient but might not be consistent whereas FE is consistent but not efficient. I would like to have this discussion in the methods section of my paper and that's why I asked how I could have a FE model that could take into account the fact that I have repeated observations for the same patient (due to the drug_number). At the moment, I am considering -panelid- -hospital- because if I do -xtset patient_ID number_drug-, Stata will know the correct number of patients in my dataset but then the effect I am getting is the individual effect, and I want the hospital-specific effect. Do you know what I mean? It is true what you mentioned that 'I cannot have hospitals' quality of life' but I guess with a FE model I cannot have more than 1 level and so what Stata is doing here is calculating the mean of all patients' quality of life for each hospital and then run a regression and estimate the hospital effect which is what I want, isn't it?

Thank you very much again!
Oliveira
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#23

09 Jul 2018, 12:12

Sofia:
trying not to repeating myself, I would focus on one new item of your last post, that is the second feature of -re- specification, that is the additional orthogonality condition that both the component of the composite error should not be correlated with the vector of regressors. There's a way to relax this assumption as far as the panel-wise error term is concerned, that is switching to -mundlak correction (-search mundlak-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#24

22 Jul 2018, 05:01

Hi Carlo

I want to thank again all your help and patience on this. It has helped me to move along in my project. Thanks a lot!!

Just an FYI - I have a 3-level model: drugs clustered within patients who themselves are clustered within hospitals. I am creating 3 levels because of how my dataset is designed. I have patients repeated more than once because they get more than one drug over time and each drug is associated to the outcome of interest (quality of life) at baseline and follow-up. But at the same time, I will also have a 2-level model (only patients clustered within hospitals) in which I am not going to account for that and so treat each observation as a different patient which is a simplification. I will then compare results.

I have a new question, any help is appreciated!
From -mixed- I get total number of observations being captured in the model (500), number of hospitals (15) and number of patients (430) and I also get the min, max and average observations per group. In this case, hospital group (min:1patient, max:380patients, average:30patients) and patient group (min:1drug; average:1.4drugs; max:4drugs). I would like to know if it is possible to know exactly how many patients per each centre and how many drugs per each patient? I will need these numbers to do a standardisation and that is why I ask.

Thank you very much again !

Regards
Oliveira
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17676

#25

22 Jul 2018, 05:13

Sofia:
the simplest approach that spring to my mind is reported in the following toy-example:

Code:

use http://www.stata-press.com/data/r15/productivity
. mixed gsp private emp hwy water other unemp || region: || state:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood =  1430.5017 
Iteration 1:   log likelihood =  1430.5017 

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        816

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         region |          9         51       90.7        136
          state |         48         17       17.0         17
-------------------------------------------------------------

                                                Wald chi2(6)      =   18829.06
Log likelihood =  1430.5017                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
         gsp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
         emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
         hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
       water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
       other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
       unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
       _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
region: Identity             |
                  var(_cons) |   .0014506   .0012995      .0002506    .0083957
-----------------------------+------------------------------------------------
state: Identity              |
                  var(_cons) |   .0062757   .0014871      .0039442    .0099855
-----------------------------+------------------------------------------------
               var(Residual) |   .0013461   .0000689      .0012176    .0014882
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. bysort region: tabstat region, stat(count)

-----------------------------------------------------------------------------------------------------------------------
-> region = 1

    variable |         N
-------------+----------
      region |       102
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 2

    variable |         N
-------------+----------
      region |        51
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 3

    variable |         N
-------------+----------
      region |        85
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 4

    variable |         N
-------------+----------
      region |       119
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 5

    variable |         N
-------------+----------
      region |       136
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 6

    variable |         N
-------------+----------
      region |        68
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 7

    variable |         N
-------------+----------
      region |        68
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 8

    variable |         N
-------------+----------
      region |       136
------------------------

-----------------------------------------------------------------------------------------------------------------------
-> region = 9

    variable |         N
-------------+----------
      region |        51
------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#26

22 Jul 2018, 11:49

Thank you Carlo again! I needed to add -if e(sample)- and then it works exactly how I wanted. Thank you very much again!

Regards
Oliveira
Comment
Sofia Oliveira

Join Date: Jun 2018

Posts: 8
#27

07 Aug 2018, 09:05

Hi Carlo

I hope you're well. I was told I should have a 2 level model with patients clustered in centres only, as you initially suggested. I am trying to calculate the centre effect with Fixed Effects (FE) and Random Effects (RE). I cannot use xtset, xtreg because of the way my dataset is organised, I have repeated patient IDs for each centre due to each patient having more than one drug/observation. For FE I am going to use dummy variable analysis reg outcome_6months outcome_baseline $controls i.drug i.Hospital, cluster(Hospital). For RE I will be using mixed outcome_6months outcome_baseline $controls i.drug || Hospital: My problem is when I run the hausman test like this:

Code:

quietly reg outcome_6months outcome_baseline $controls i.drug i.Hospital estimates store FE quietly mixed outcome_6months outcome_baseline $controls i.drug || Hospital: estimates store RE hausman FE RE

I get the following error:
no coefficients in common; specify equations(matchlist)
for problems with different equation names.

Could you please advise? Thank you so much in advance for all your help!

Regards
Oliveira
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment