Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Three-level models with FE

    Hello StataList,

    This is my first time posting, but I've enjoyed the benefit of reading this list for many years. I'm writing in hopes of getting advice on the best way to operationalize a 3-level regression model where the highest level has to be a fixed (versus random) effect.

    The levels are: months nested within patients nested within geographic areas. Patient is a random effect, but geographic area needs to be fixed. My outcome is cost, so it has lots of zeros and heavy right tails.

    Ideally there would be an autoregressive or Toeplitz correlation structure on the months nested within patient, and some other correlation structure (probably exchangeable) for patients within geographic area.

    I was hoping to use xtgee, but that does not accommodate fixed effect (FE) for the highest level.

    So, two questions:

    1. If I instead operationalized an xtgee with patient as the highest level (2-level model) and included a dummy variable for geographic region, would you consider that inference to be the same as one from a 3-level model with a FE for geographic region? I want my inference to be within geographic area, rather than adjusting for geographic area.

    2. Is there a 3-level model command available for continuous outcomes that will let me use a FE for the highest level? I see xtmelogit and xtmepoisson but nothing "xtme" for continuous data.

    My thanks for any feedback you can provide.

    Risha

  • #2
    1. If I instead operationalized an xtgee with patient as the highest level (2-level model) and included a dummy variable for geographic region, would you consider that inference to be the same as one from a 3-level model with a FE for geographic region? I want my inference to be within geographic area, rather than adjusting for geographic area.
    Yes, you can use -xtgee- with patient as the second level, and incorporate indicators ("dummies") for region as bottom-level fixed effects. In order to get within region inferences about the effects of a predictor variable x, you can decompose the effects of x into between-region and within-region components as follows:

    Code:
    by region, sort: egen x_mean = mean(x)
    gen x_dev = x - x_mean
    and then incorporate both x_mean and x_dev (but not x) into the model. The coefficient of x_dev is your estimate of the within-region effect of x. The coefficient of x_mean is the estimate of the between-region effect of x.

    2. Is there a 3-level model command available for continuous outcomes that will let me use a FE for the highest level? I see xtmelogit and xtmepoisson but nothing "xtme" for continuous data.
    Not that I am aware of. And -xtmelogit- and -xtmepoisson- (which, by the way, in current Stata are called -meqrlogit- and -mepoisson-) do not provide a fixed effect at any level other than the bottom. I'm not sure what leads you to think otherwise.

    Comment


    • #3
      Thanks very much, Clyde. In the case of a using a geographic area dummy as a FE, and just using the dummy (not the x_mean and x_dev) that you denote above, how would you interpret the coefficient on X1? Would it be any different than your interpretation of the coefficient on X1 if you had a model where geographic area was denoted as FE clustering variable?

      Re: mixed effects models -- I've heard that mixed effect models can accommodate FE at the highest level from colleagues at Stanford. However, the examples I've seen online and in textbooks have all had RE as the highest level.

      Comment


      • #4
        I can't interpret a description in words of a model. Please show the commands you have in mind. If you have already run them, please show the output as well.

        Comment


        • #5
          I haven't run the model yet, as we're still building the analytic dataset.

          Here's the code I'm thinking of for the model in a 2-level framework with a dummy for the geographic area (simplified to focus on my question):

          xtset patientID

          xtgee cost Medicare month month*Medicare covariates i.geographic_area month*i.geographic_area, family (gamma) link (log) corr (ar1)



          And the model in a 3-level framework:

          meglm cost Medicare month month*Medicare covariates || geographic_area: || patientID: , family (gamma) link (log)

          My ultimate goal is to predict costs from these models so if the xtgee will be easier for predicting I would prefer to go that route. My desired interpretation is how moving from Medicare = 0 to Medicare = 1 influences cost within a geographic area. I don't need to know the specific impact in each geographic area, but want to know the average effect.

          Thanks for your feedback!

          Comment


          • #6
            There are several differences between these models. Of them, ease of predicting is probably the least important: it will be quite simple with both models. But some of the other differences are important and you need to consider them carefully:

            1. If you have only a small number of geographic regions, it is not a good idea to make it into a random effect. Your estimate of variation at that level is, in effect, based on a sample whose size is the number of regions. If you wouldn't use a sample that small in isolation, you shouldn't use region as a random effect.

            2. The -meglm- model assumes that patients are nested within geographic areas. This seems plausible enough, but is it true? If this is a long term national data set, some patients may move from one region to another during the period of observation. If that's sufficiently infrequent, you can ignore it, but otherwise it's a real issue.

            3. -xtgee- will give you population-averaged effect estimates, whereas -meglm- 's estimation is based on within-individual differences. Since your model is non-linear, these are different. You need to be clear which one you want.

            Finally, do remember to use factor-variable notation fully when coding whichever of these models you find appropriate. By doing that, you will be able to get your estimate of the marginal effect of Medicare with a one-line -margins- command after you estimate the model.

            Comment


            • #7
              I have ~275 geographic areas; number of clusters is not a problem. However, as noted above, the pressing need is to make geographic area a FE, not a RE.

              Patients are most certainly nested within geographic areas. No real issues about moving from one area to another in the time frame of interest (we have confirmed this in our dataset).

              Understood about the different interpretations. I prefer the -xtgee- approach with the population-averaged interpretation; I'm (grudgingly) specifying the -meglm- only because I need to operationalize a 3-level model with logged costs (and there is no option for a -megee-).

              My concern is whether the "i.geographic_area" in the top model is doing the same as what a "xtset geographic_area" would do -- can you let me know your thoughts about that?
              Last edited by Risha Gidwani-Marszowski; 08 Aug 2018, 21:34.

              Comment


              • #8
                I think that including i.geographic_area in your -xtgee- command is the closest you can come to having a 3-level model estimated by generalized estimating equations. It is still a 2level model, but the inclusion of i.geographic_area will accomplish what you need.

                By the way, in order to use -corr(ar1)- your -xtset- command will have to specify a time variable. So I guess you want -xtset patientID month-. (Also note that by specifying -corr(ar1)- you will lose any observations where the preceding month's observation contains a missing value in any model variables, or if there is no observation for the preceding month for that patientID. Depending on the completeness of your data set, this might or might not be a serious problem.)

                Comment


                • #9
                  Perfect -- that's exactly what I was hoping for.

                  I don't have much missingness -- maybe 3-4% of observations have any missing covariate. For a different outcome on the same-ish dataset, I did multiple imputation, but it took a really long time to run with logistic regression. That was with a 2-level model, so I'm afraid with my 3-level model incorporating MI will be a bit of a nightmare.

                  Thanks very much for your feedback, Clyde.

                  Comment


                  • #10
                    Hi Clyde,
                    I have a similar issue to Risha's question, and was reading your answer on the forum. I have a panel model of universities nested within states, and am using gee to control for clustering within universities and including fixed effects for year. I also want to include fixed effects for state, as my key IV is at the state level. Based on your earlier replies, I know that I should include x_mean and x_dev instead of x to estimate the within- and between-state effects for x, but should I also include the dummies for state in this model as well? In other words, which of these models is correct?

                    xtset university year
                    xtpoisson y x_mean x_dev i.year i.state, pa vce(robust) exp(population)

                    OR

                    xtset university year
                    xtpoisson y x_mean x_dev i.year, pa vce(robust) exp(population)

                    I have a lot of between-state variation, but very little within-state variation, so x_mean is my main focus.

                    Thanks so much for your help!
                    ​​​​​​​

                    Comment

                    Working...
                    X