Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel multinomial model - gsem slows down considerably after adding random slope

    Dear statalisters,

    I would very much appreciate your help with my analysis.

    Dataset:
    1500 respondents nested in 70 clusters; 19 independent variables in a full model (different types - categorical, continuous, interaction terms) and I tried to rescale them to so they take values from - 10 to 93 (some are not discrete, ommiting one variable with negative values did not solve my problem).
    Dependent variable has 3 categories - number of respondents is not distributed equally.
    Since i have 2 variables with about 30 % missing values I am using multiple imputation - methods used are pmm, logit, mlogit and regress, one of variables has 3 categories so augment option is included.

    code:
    mi set flong

    mi register imputed volba migranti_2011_bezS migrantiS_squared vek_squared_rescale ses egal lidr egalitarstvi euroskepticismus_rek apatie anti_migration spk pohlavi_rek vek_rek bydliste_2kat religiozita_rek cynismus zamestnani_new vzdelani_3k fear

    mi impute chained (mlogit) volba (regress) migranti_2011_bezS (pmm, knn (10)) vek_squared_rescale (regress) ses (pmm, knn (10)) lidr (pmm, knn (10)) egal (pmm, knn (10)) egalitarstvi (pmm, knn (10)) euroskepticismus_rek (pmm, knn (10)) apatie (pmm, knn (10)) anti_migration (pmm, knn (10)) spk (logit) pohlavi_rek (pmm, knn (10)) vek_rek (logit) bydliste_2kat (logit) religiozita_rek (pmm, knn (10)) cynismus (pmm, knn (10)) vzdelani_3k (mlogit) zamestnani_new (pmm, knn (10)) fear [pweight=vaha], add(5) rseed (54321) savetrace(trace1, replace) augment
    Goal:
    In a nutshell explaining differencies between voters, voters of radical right and non-voters.

    When I am using a model with random intercept only it converges in a minute or so (dependent on how many imputations i did) but after adding a random slope gsem slows incredibly. It took STATA more than 2 hours to create a simplifyed model (16 variables) with random slopes when I used only 5 imputations (which is not enough) and intpoints (3) option.

    Model which runs with no issues has the same code as below except "1.amm#M2[okres_cd]@1" = random slope part. Amm variable has a binary form when 1 specifies negative attitudes towards migrants above median and 0 the rest. There is no change in model behaviour even when I am using continuous form (anti_migration) with values ranging from 0-10. Logic for adding random slopes is that I want to test if the effect of negative feelings towards migrants varies between areas.

    full model with random slopes:
    Code:
    mi estimate, dots cmdok: gsem (i.volba <- i.man migranti_2011_bezS migrantiS_squared vek_rek vek_squared_rescale i.religiozita i.bydliste ses i.vzdelani_3k i.zamestnani_new egalitarstvi i.amm egal euroskepticismus_rek apatie cynismus spk fear lidr M1[okres_cd]@1 1.amm#M2[okres_cd]@1, mlogit) [pweight=vaha]
    I tried to modify my code using difficult option, simplifying model, intmethods and techniques but with no succes. So I wonder if there is something wrong with my code that causes such a staggering difference between model with and without random slope.

    Edit I have STATA 16 MP.
    Last edited by Filip Safr; 17 Nov 2019, 07:51.

  • #2
    gsem is slow and using MI and random slopes does not help any. I don't think 2 hours is that bad -- I have seen models that take far longer than that.

    You might try running the model without mi and use the results as start values. The gsem manual has sections on improving start values and getting convergence.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 18.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you for your reply. I will definitely look into starting values. I have a concern that using results from not imputed data could bias final estimates but that is probably caused by me not understanding the starting values option well enough. I agree that 2 hours are not that bad if it was a final model. When I increased number of imputed datasets, gsem seems to need roughly the same amount of time for converging each of them (i did break the computation after a few when experimenting). So if I use lets say 50 instead of 5 it will increase the amount of time needed considerably and it could take days. Therefore I am looking for every improvement possible.

      Comment


      • #4
        I tried the starting values option but it did not speed up the computation process. Do you think it is justifiable to use nonrtolerance option knowing my model will converge otherwise (but after many hours or days when adding more imputed datasets) and is converging without random slopes?

        Comment

        Working...
        X