Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating predicted probabilities of random effects after melogit

    Hello,

    My question is on estimating coefficients for random effects in mixed effect models. I am building a 3 level mixed effect model in order to describe hospital-level rates of testing patients with neuro-imaging. My model includes ~ 45,00 patients clustered within 480 hospitals clustered within 73 geographic regions (health referral regions or HRRs). My goal is to estimate adjusted diagnostic testing rates for each hospital (the second level in the model).

    My goal is a graph similar to figure 2 of the following manuscript, which is a dot plot depicting estimated hospital level rates of a binary outcome with 95% CIs:
    HTML Code:
     https://www.ncbi.nlm.nih.gov/pubmed/29404575
    Below is the mixed effect model I have built to adjust for patient level factors such as age, gender, co-morbidities and to account for clustering within hospitals and regions. The outcome is binary, whether or not a patient received a neuro-imaging test, and thus I used the melogit command:

    Code:
     
    melogit neuro_imaging i.u_icu i.u_ccu i.obs age i.gender i.pay1 i.aweekend i.anticoag i.neuro_symptom i.head_trauma cm_aids cm_alcohol cm_anemdef cm_arth cm_bldloss cm_chf cm_chrnlung cm_coag cm_depress cm_dm cm_dmcx cm_drug cm_htn_c cm_hypothy cm_liver cm_lymph cm_lytes cm_mets cm_neuro cm_obese cm_para cm_perivasc cm_psych cm_pulmcirc cm_renlfail cm_tumor cm_ulcer cm_valve cm_wghtloss || hrrcode: || hospital:, or intpoints(10)
    
    Fitting fixed-effects model:
    
    Iteration 0:   log likelihood = -27240.211  
    Iteration 1:   log likelihood = -27197.166  
    Iteration 2:   log likelihood = -27197.154  
    Iteration 3:   log likelihood = -27197.154  
    
    Refining starting values:
    
    Grid node 0:   log likelihood = -22450.666
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -22450.666  
    Iteration 1:   log likelihood = -22277.979  
    Iteration 2:   log likelihood = -22271.953  
    Iteration 3:   log likelihood = -22271.818  
    Iteration 4:   log likelihood = -22271.818  
    
    Mixed-effects logistic regression               Number of obs     =     45,513
    
    -------------------------------------------------------------
                    |     No. of       Observations per Group
     Group Variable |     Groups    Minimum    Average    Maximum
    ----------------+--------------------------------------------
            hrrcode |         73         11      623.5      7,301
           hospital |        486         10       93.6      1,077
    -------------------------------------------------------------
    
    Integration method: mvaghermite                 Integration pts.  =         10
    
                                                    Wald chi2(43)     =    2017.57
    Log likelihood = -22271.818                     Prob > chi2       =     0.0000
    -------------------------------------------------------------------------------------
          neuro_testing | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
                        |
                1.u_icu |   .9617766   .0441584    -0.85   0.396     .8790077    1.052339
                1.u_ccu |   .9170839   .0490852    -1.62   0.106     .8257529    1.018516
                  1.obs |   1.097361   .0564116     1.81   0.071     .9921839    1.213687
                    age |   1.006762   .0010272     6.60   0.000      1.00475    1.008777
                        |
                 gender |
                Female  |   1.086055   .0273125     3.28   0.001     1.033822    1.140928
                        |
                   pay1 |
              Medicaid  |   1.020001   .0497538     0.41   0.685     .9270017     1.12233
     Private Insurance  |    1.07706   .0400239     2.00   0.046     1.001403    1.158433
              Self-Pay  |   1.092733   .0777682     1.25   0.213     .9504636    1.256299
             No-Charge  |   1.134688   .1900918     0.75   0.451     .8171027    1.575711
                 Other  |   .9632547   .0924938    -0.39   0.697     .7980074    1.162721
                        |
             1.aweekend |   1.093833   .0296999     3.30   0.001     1.037144    1.153621
             1.anticoag |   .8961464   .0279134    -3.52   0.000     .8430737    .9525601
        1.neuro_symptom |   2.231223   .0697267    25.68   0.000     2.098662    2.372157
          1.head_trauma |   3.775735   .1881069    26.67   0.000      3.42448    4.163018
                cm_aids |   1.497218   .3726232     1.62   0.105     .9192644    2.438538
             cm_alcohol |   1.257131   .0785104     3.66   0.000     1.112298    1.420822
             cm_anemdef |   .9110613   .0304405    -2.79   0.005     .8533106    .9727205
                cm_arth |   1.097065   .0797171     1.27   0.202     .9514388    1.264981
             cm_bldloss |   .7145878   .1033806    -2.32   0.020     .5381593    .9488561
                 cm_chf |   .8469047   .0545417    -2.58   0.010     .7464764    .9608443
            cm_chrnlung |   1.062038   .0346219     1.85   0.065     .9963029    1.132111
                cm_coag |   1.048007    .069548     0.71   0.480     .9201882    1.193581
             cm_depress |   1.211311   .0469752     4.94   0.000     1.122653     1.30697
                  cm_dm |   1.074146   .0317421     2.42   0.016       1.0137    1.138196
                cm_dmcx |   1.073205   .0643979     1.18   0.239     .9541265    1.207144
                cm_drug |   1.138923   .0853021     1.74   0.082     .9834267    1.319007
               cm_htn_c |   .9917578   .0275891    -0.30   0.766     .9391318    1.047333
             cm_hypothy |   1.009805   .0353911     0.28   0.781     .9427688    1.081609
               cm_liver |    1.03391   .0926915     0.37   0.710     .8673042    1.232521
               cm_lymph |   1.113212   .1431993     0.83   0.404     .8651322    1.432429
               cm_lytes |    .952935   .0257518    -1.78   0.074     .9037757    1.004768
                cm_mets |   1.251564   .1293983     2.17   0.030     1.021993    1.532704
               cm_neuro |    1.42887   .0578913     8.81   0.000     1.319794    1.546962
               cm_obese |   .9985896   .0467823    -0.03   0.976     .9109816    1.094623
                cm_para |    1.33541   .1415068     2.73   0.006     1.084968    1.643661
            cm_perivasc |   .9368681   .0484128    -1.26   0.207     .8466278    1.036727
               cm_psych |   1.402534   .0862985     5.50   0.000     1.243193    1.582297
            cm_pulmcirc |   .9512912    .105854    -0.45   0.654     .7648863    1.183124
            cm_renlfail |   .9259629   .0340846    -2.09   0.037     .8615112    .9952364
               cm_tumor |   .8685913   .0666102    -1.84   0.066     .7473755    1.009467
               cm_ulcer |   1.655414   1.290669     0.65   0.518     .3591321    7.630609
               cm_valve |   .9076863   .0683901    -1.29   0.199     .7830717    1.052132
            cm_wghtloss |   1.148169   .0918705     1.73   0.084      .981515    1.343119
                  _cons |   .5648739   .0973378    -3.31   0.001     .4029715     .791824
    --------------------+----------------------------------------------------------------
    hrrcode             |
              var(_cons)|   1.091473    .302129                      .6344438    1.877728
    --------------------+----------------------------------------------------------------
    hrrcode>hospital    |
              var(_cons)|   1.713218   .1434292                      1.453954    2.018713
    -------------------------------------------------------------------------------------
    LR test vs. logistic model: chi2(2) = 9850.67             Prob > chi2 = 0.0000
    


    Next, I use post-estimation commands to obtain empirical bayes means
    Code:
     predict hrr_effects hosp_effects, reffects
    Finally, I generate an adjusted testing rate accounting for hospital and region random effects
    Code:
     gen adj_neurotesting_rate = exp(_cons+hosp_effects+hrr_effects)/(1+exp(_cons+hosp_effects+hrr_effects))
    Questions
    1. As graphed below, when I compare the adjusted and unadjusted hospital testing rates (unadjusted rates calculated by dividing the # of patients who received a test by the total # of patients seen at the hospital), I notice that all adjusted rates are higher than the crude rates. It seems incorrect that adjustment would only result in higher predicted probabilities for every hospital, but I am unsure where I may have erred in my process, and was hoping the forum might be able to advise me on where I went wrong.

    2. Do I need to group mean center my variables to obtain accurate estimates of hospital-level rates?
    3. I can use the predict, reses command to calculate standard errors of the empirical bayes estimates. How do I transform these estimates into 95% confidence intervals to match the predicted probabilities?
    4. If I were to add hospital characteristics in the model as random coefficients, how would I account for this in calculating predicted probabilities of the hospital random effects.
    e.g.. a dichotomous marker for profit status as shown in example code below:

    melogit neuro_imaging [omitting variables for brevity] || hrrcode: || hospital profit_status:,

    Thank you.
    Tim
    Attached Files

  • #2
    1. As graphed below, when I compare the adjusted and unadjusted hospital testing rates (unadjusted rates calculated by dividing the # of patients who received a test by the total # of patients seen at the hospital), I notice that all adjusted rates are higher than the crude rates. It seems incorrect that adjustment would only result in higher predicted probabilities for every hospital, but I am unsure where I may have erred in my process, and was hoping the forum might be able to advise me on where I went wrong.
    Well, your formula for adjustment implicitly sets all of the covariates to 0. That is, you are adjusting to a "standard hospital" in which every one of the predictors in your model is zero. There's nothing inherently wrong with that, but it probably accounts for what you are noticing. Also, note that the absolute levels of adjusted rates are not really meaningful anyhow--they are an artifact of what distribution of predictors you adjust to; adjusted rates are intended for comparisons that eliminate unwanted effects.

    2. Do I need to group mean center my variables to obtain accurate estimates of hospital-level rates?
    I don't understand what you're getting at here. What hospital level rates are you talking about? Adjusted rates are never "accurate," by design. If you mean the crude rates, centering variables will have no effect anyway.

    3. I can use the predict, reses command to calculate standard errors of the empirical bayes estimates. How do I transform these estimates into 95% confidence intervals to match the predicted probabilities?
    Just run the command, specifying variable names to receive them, and you don't need to do any other calculations--you'll have your standard errors. You can add and subtract 1.96*these standard errors to the expression you exponentiate to get the adjusted rates in order to come out with confidence limits for the adjusted rates.

    4. If I were to add hospital characteristics in the model as random coefficients, how would I account for this in calculating predicted probabilities of the hospital random effects.
    e.g.. a dichotomous marker for profit status as shown in example code below:

    melogit neuro_imaging [omitting variables for brevity] || hrrcode: || hospital profit_status:,
    First, in the code, profit_status comes after the colon, not before it. Next, profit_status must also be included among the "fixed effects" in the model to get a proper specification. Next, I don't understand why you would want to do this. Allowing each hospital to have a "customized" effect of its for-profit status doesn't make any sense to me.

    I think you are perhaps confused a bit about how random slopes work and what they mean. Just because profit_status is a hospital level variable is no reason to assign random slopes to it. It is perfectly reasonable to just include it among the fixed effects--which would make sense to me. Random slopes are, in effect, cross-level interactions. They say that a certain fixed-effect actually has different effects depending on the higher level entity within which that observation falls. But if that effect is actually only defined at the higher level, and not at the lower level, it isn't really meaningful. You can see where it might make sense to ask if the effect of a patient's having head trauma differs from one hospital to the next. But how would it make sense to ask if the effect of a hospital's being for-profit differs from one hospital to the next? It might make sense if some of the hospitals in your sample changed their for-profit status during the course of the study: but given that this seldom happens (and I'm guessing it doesn't in your study), you would in fact have an unidentified model because you could arbitrarily shift some of the variation from the hospital level intercept to that hospital's random slope on for_profit and come out with the same predictions. It is most likely your model will not converge if you try to estimate it.

    Comment


    • #3
      Regarding your first question, I overlooked another important issue. In your formula, you exponentiate the sum of the constant term and the two random intercepts. But that does not take you to the probability metric: that takes you to the odds metric; which you are then comparing with probabilities (crude rates). Apples to oranges. To get into the probablity metric from the sum of constant term and random intercepts you need -invlogit()-, not -exp()-.

      Added: Sorry--you did do -invlogit()-. My error: I was looking at just the part of the code that showed without scrolling!
      Last edited by Clyde Schechter; 26 Dec 2018, 14:43.

      Comment


      • #4
        Clyde,

        Thank you for your helpful clarifications and for improving my understanding of random slopes.

        Regarding your response to my first question, I understand that my current approach sets all covariates to 0 and thus this likely accounts for the pattern depicted in the graph. My overall goal is to generate an equivalent of the results of the margins command, the average predicted probability of receiving a test if every patient in the cohort was treated at hospital 1 vs hospital 2 vs hospital 3, and so on. Thus, I do not want the results of a model which sets all other covariates at 0.

        If I had run a single level logistic regression and included hospital as a fixed effect I could run the logistic regression followed by
        margins i.newahaid to generate average predicted probabilities...but my understanding is that this is not possible with random effects.

        I also understand that using for studying adjusted predicted probabilities in the context of comparing hospitals, a random effects model is preferable to using a fixed effects model as discussed here: https://www.annalsthoracicsurgery.or...14)02203-6/pdf

        My second question, which I did not clearly state before, was an idea for how to address this issue of covariates being set at 0. My thought was to transform the covariates so they are centered around the hospital mean. For example for gender, I would generate the following variable:
        Code:
        egen gender_means = mean(gender), by (newahaid)
        gen gmc_gender = gender - gender_means
        Then use the centered covariates in the regression, so that my formula for adjustment would implicitly set all covariates to the hospital mean, but I am still not sure if this would accurately achieve my goal of generating the average predicted probability of the outcome for each hospital? When I've run this I see a similar pattern of all adjusted rates being higher than crude rates.

        Sincerely,
        Tim

        Comment


        • #5
          If I had run a single level logistic regression and included hospital as a fixed effect I could run the logistic regression followed by
          margins i.newahaid to generate average predicted probabilities...but my understanding is that this is not possible with random effects.
          Correct.

          I also understand that using for studying adjusted predicted probabilities in the context of comparing hospitals, a random effects model is preferable to using a fixed effects model as discussed here: https://www.annalsthoracicsurgery.or...14)02203-6/pdf
          Well, I agree with the conclusion, particularly for this particular purpose, but I could also raise good arguments for using a fixed-effects model. It really depends on what you want to use the results for. So just don't draw any more general conclusions from the article. I've noticed that fixed vs random effects model is an area where there is considerable division among disciplines and a tendency in a discipline to rigidly use one or the other. But, again, depending on the intended purpose, either might be suitable.

          My second question, which I did not clearly state before, was an idea for how to address this issue of covariates being set at 0. My thought was to transform the covariates so they are centered around the hospital mean. For example for gender, I would generate the following variable:
          Code:
          egen gender_means = mean(gender), by (newahaid) gen gmc_gender = gender - gender_means
          Then use the centered covariates in the regression, so that my formula for adjustment would implicitly set all covariates to the hospital mean, but I am still not sure if this would accurately achieve my goal of generating the average predicted probability of the outcome for each hospital? When I've run this I see a similar pattern of all adjusted rates being higher than crude rates.
          Well, this approach, as a whole, will not get you there. Remember that invlogit is a non-linear function. Your approach is to calculate a mean of (centered) xb + random effects and then apply invlogit() to that. But that does not get you what -predict- would calculate which is the mean of invlogit(xb+random effects)-. The order of taking means and applying invlogit() makes a difference due to the nonlinearity.

          To get something similar to what -margins- could get you, you need to emulate -margins- itself: you cannot get it by applying invlogit() to something calculated from the -melogit- results. I'm going to give you the steps for getting what -margins- would give you if it could do this directly. I'm not going to do it with the -atmeans- constraint because with categorical predictors it's much more complicated. So this will be adjusted to the overall joint distribution of the predictors in the entire sample, rather than at sample means.

          1. Run -predict hrr_effects hosp_effects, reffects- to get the random effects in variables.
          2. Clone each of those variables: let's call them hrr_orig and hosp_orig, so as to preserve the original values
          3. Create a predicted outcome variable, modeled_outcome, initialized to missing value.
          4. Looping over hospitals :
          4a. Replace hrr_effects and hosp_effects by the current hospital's values of hrr_orig and hosp_orig in all observations of the estimation sample from melogit.
          4b. Run predict xb, xb and then -gen temp_pr = invlogit(xb + hrr_effects + hosp_effects) if e(sample)-. Drop xb.
          4c. -summ temp_pr, meanonly- and then replace modeled_outcome= r(mean) in observations of the current hospital only.
          At this point variable modeled_income, in each observation, will contain an adjusted predicted probability for its hospital (it will be the same value in all observations of a given hospital) that is analogous to what -margins- would do for you.
          5. (Optional) Restore the values of hrr_effects and hosp_effects to their original values and drop hrr_orig and hosp_orig.

          This is somewhat complicated, but doable. That said, let me emphasize something I said back in #2. If your purpose is to compare the hospitals, you don't have to go through all that trouble. Your original version of adjusted predictions will work perfectly well for that purpose, even though they are all "too high." You're generating a lot of work for yourself to achieve a cosmetic refinement of your results. In fact, in educational research, where this kind of model has been in use much longer than it has been in health care, they don't even bother with adjusted predictions. They do their comparisons by looking directly at the random intercepts themselves, often displaying them in a "caterpillar plot." Remember that in these models the random intercepts capture all of the idiosyncratic effect of the nesting units in the model. They are already "adjusted" for everything else.

          (If you are not familiar with caterpillar plots, there is a very succinct explanation at http://www.metafor-project.org/doku....terpillar_plot.) You can make a caterpillar plot in Stata by superimposing a line plot of the random effects with an -rcap- plot of the upper and lower confidence limits.

          Comment


          • #6
            Clyde,

            Thank you again for the detailed explanation.

            For your suggested steps, I am not totally clear about how to carry out step 4 a&b.
            Are you suggesting that for each hospital, the melogit regression needs to be rerun in order to generate new fixed effect predictions (xb)? Or rather that for each hospital, the command gen temp_pr = invlogit(xb + hrr_effects + hosp_effects) if e(sample) needs to be rerun substituting the original hrr_effects and hosp_effects? I am not sure how to rerun a prediction substituting a previously predicted value.

            Second, to clarify my intent, in regards to your remarks about whether there is a need for this additional step. The goal is both to compare hospitals and provide an idea of the magnitude of the estimate outcome frequency. Simply saying that the hospital with the highest testing rate conducts twice as many imaging tests as hospital with the lowest testing rate is useful information. However it does not give the additional useful information of being able to say the highest testing hospital conducts imaging tests on 80% of patients compared to the lowest testing hospital conducting imaging tests on 40% of patients (or alternatively 20% vs 10%). This is the reason I am concerned about accurately estimating the absolute and not just relative comparison between hospitals.

            Sincerely,
            Tim

            Comment


            • #7
              [quote]For your suggested steps, I am not totally clear about how to carry out step 4 a&b.
              Are you suggesting that for each hospital, the melogit regression needs to be rerun in order to generate new fixed effect predictions (xb)? Or rather that for each hospital, the command gen temp_pr = invlogit(xb + hrr_effects + hosp_effects) if e(sample) needs to be rerun substituting the original hrr_effects and hosp_effects? I am not sure how to rerun a prediction substituting a previously predicted value.[/qutoe]
              No, the -melogit does not get re-run. The code will look something like this:

              Code:
              melogit ....
              predict hrr_effects hosp_effects, reffects
              clonevar hrr_orig = hrr_effects
              clonevar hosp_orig = hosp_effects
              gen model_prediction = .
              
              levelsof hospital, local(hosps)
              foreach h of local hosps {
                  summ hrr_orig if hospital == `h', meanonly
                  replace hrr_effects = r(mean) if e(sample)
                  summ hosp_orig if hospital == `h', meanonly
                  replace hosp_effects = r(mean) if e(sample)
                  predict xb if e(sample), xb
                  gen temp_prediction = invlogit(xb + hrr_effects + hosp_effects)
                  summ temp_prediction if e(sample), meanonly
                  replace model_prediction = temp_prediction if hospital == `h'
                  drop xb temp_prediction
              }
              Not tested: there may be some errors here, but this is the gist of it. The results in variable model_prediction will represent the model's prediction of what each hospital's rate would be if all of the hospitals had the same distribution of predictors as actually describes the entire estimation sample. So these can be compared from one hospital to another because all of the variation attributable to the predictors has now been eliminated from these.

              This is the reason I am concerned about accurately estimating the absolute and not just relative comparison between hospitals.
              But you are trying to do two different things with one statistic that is only appropriate for one of those purposes, and there isn't any single statistic that serves both purposes. You are simply creating the illusion of absolute comparison, not the reality. An adjusted rate is not actually the real rate of any hospital, except occasionally by coincidence. The only valid absolute numbers here are the crude rates. Adjusted rates are artificial and lend themselves to comparisons that "factor out" those sources of variation that are explained by your predictors. But adjusted rates should not be interpreted as if they were absolute rates. They are only useful for relative comparisons. This is true regardless of what method of adjustment is used.

              Comment


              • #8
                Clyde,

                Thank you again for the coding explanation. To obtain standard errors for these estimates, I assume I would run a similar looped process for the reses() results?

                I do agree with you that the most informative explanation is to present both crude rates as well as adjusted predicted probabilities for relative comparisons. However I have also found that journals often prefer to depict adjusted rates in manuscripts, saving crude rates for appendices, due to concerns that crude rates do not adequately convey to readers that potential importance of case-mix differences between hospitals (I suppose this is most prominent in studies of hospital readmission rates where appropriate adjustment is quite controversial).

                Sincerely,
                Tim

                Comment


                • #9
                  To obtain standard errors for these estimates, I assume I would run a similar looped process for the reses() results?
                  No, the standard errors do not combine by averaging over observations in that way. The calculation of the standard errors of these rates is done by another method that I do not know well enough to code myself.

                  However I have also found that journals often prefer to depict adjusted rates in manuscripts, saving crude rates for appendices, due to concerns that crude rates do not adequately convey to readers that potential importance of case-mix differences between hospitals (I suppose this is most prominent in studies of hospital readmission rates where appropriate adjustment is quite controversial).
                  Well, I certainly agree that crude rates are difficult to interpret in this context. Whether they should be relegated to an appendix depends on their relevance to the main goals of your study. I generally try to include them in the main article to provide a sense of the actual observed frequencies--something you cannot reliably get from the adjusted rates.

                  It is indeed quite controversial how to adjust statistics describing hospitals in light of their differing case-mixes. It is unlikely that a consensus on that will emerge during the lifetime of anyone reading this Forum. But that is another reason why anchoring any kind of adjusted rates by presenting the crude rates as context becomes even more important.

                  And again, I think it is misleading to treat adjusted rates as if they were actual rates that can be understood in absolute terms. They should be used only for the purposes of comparing hospitals to each other. And again, that can be done just as easily using the random intercepts themselves, with much less effort. The fact that the random intercepts are on the log odds scale rather than the probability scale is, in my view, a feature not a bug--the difference in scale is so blatant that it deters people from viewing them as if they were real rates.

                  Comment


                  • #10
                    I try my luck with replying to this post after more than 5 years, but I came across this as I am trying to do something similar (i.e. to adjust diagnosis rates per country (level 2) by individual and country-level predictors). I reckon from previous replies that focusing on random intercepts would maybe simplify matters. However, I am trying to model model cascades of care where cross-sectionally is looked where chronic patients are lost along the care pathway (e.g. X% gets diagnosed, X% are in treatment, etc.). To follow up on how this is reported in the literature, I aim to model this similarly by calculating predicted probablities from the random country intercepts. I think it would be more straightforward to model countries as fixed effects and just use the -margins- command but this does not allow to model characteristics at the country level.

                    @Tim Anderson: did you ever succced in this? It would be a great help to share some experiences.

                    For now, I tried to replicate the code above. However, I think some errors were in there.
                    Code:
                    melogit ....
                    predict hrr_effects hosp_effects, reffects
                    clonevar hrr_orig = hrr_effects
                    clonevar hosp_orig = hosp_effects
                    gen model_prediction = .  
                    
                    levelsof hospital, local(hosps)
                    foreach h of local hosps {    
                    summ hrr_orig if hospital == `h', meanonly    
                    replace hrr_effects = r(mean) if e(sample)    
                    summ hosp_orig if hospital == `h', meanonly    
                    replace hosp_effects = r(mean) if e(sample)    
                    predict xb if e(sample), xb    
                    gen temp_prediction = invlogit(xb + hrr_effects + hosp_effects)    
                    umm temp_prediction if e(sample), meanonly    
                    replace model_prediction = temp_prediction if hospital == `h'    
                    drop xb temp_prediction
                    }
                    1) I think summarizing the hospital random effects per hospital was useless in a random intercept model with only two levels as they are already subject-specific...
                    2) Obtaining the average of temp_prediction should be subject-specific, no? If not, you don't get subject-specific (i.e. hospital-specific) predictions.

                    I applied this code to my own case and would like to ask if someone can confirm whether this approach is correct so that probabilities can be considered as what each country's rate would be if all of the countries had the same distribution of predictors as in the sample.

                    Code:
                    melogit dm_diagnosed age b1.gender || country:
                    predict dia1_u, reffects reses(dia1_se)
                    gen dia1 = .
                    levelsof country, local(cntry)
                    
                    foreach c of local cntry {    
                    predict xb if e(sample), xb    
                    gen temp_dia1 = invlogit(xb + dia1_u)    
                    summ temp_dia1 if e(sample) & country == `c', meanonly    
                    replace dia1 = r(mean) if country == `c'    
                    drop xb temp_dia1
                    }
                    Last edited by Wilfried Pasmans; 30 Sep 2024, 10:11.

                    Comment

                    Working...
                    X