Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit: Can I restrict the number of iterations?

    I tried this code (Y is dependent and other variables are independent):

    probit Y local_income_PPPX local_wealth_PPPX birthyear MS EDU PHS_dummy FHR


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id_household local_income_PPPX local_wealth_PPPX birthyear) double MS float Y double EDU float(PHS_dummy FHR)
     1 166728.34  89150.55   1952 6 0 0 1  .4040404
     2 12111.018 420.52145   1939 4 0 1 0  .6521739
     3  20185.03  51.05861 1951.5 1 0 1 1  .5925926
     4  8477.712  50592.04   1930 6 0 0 0  .9411765
     5 11606.392  7148.865   1924 1 0 1 0      .625
     6  53490.33 290330.53   1952 1 1 1 1  .4081633
     7  25231.29 26913.373   1942 1 0 1 1 .27826086
     8 22809.084 24642.557   1954 3 0 1 1 .04301075
     9  20185.03 420.52145   1924 6 0 0 0  .4736842
    10 15138.771  76871.32   1937 1 0 0 1  .0441989
    12   35323.8 252.31287   1932 6 0 0 0  .3287671
    13 30277.543  677123.6 1966.5 1 0 1 1 .03652968
    14 15138.771 105130.36   1950 6 0 0 1  .1818182
    15 18166.525  6728.343 1942.5 1 0 1 1 .13636364
    16 18166.525  260723.3   1952 1 0 1 1 .27906978
    17 15138.771  586635.5   1952 1 0 1 1 .07655502
    18  74282.02 129359.22   1959 1 0 0 1   .516129
    19  246055.5  925.1472   1936 1 0 1 0  .3361345
    20 179646.77  76955.42   1949 1 0 1 1 .35928145
    21 11606.392 10092.515   1950 1 0 1 1 .20168068
    end


    The problem is that there are hundreds of iterations until I stopped the command. Is this normal? And can I maybe restrict the number of iterations?

    Maybe I should somehow integrate the ID (there is a unique observation of the variable per id_household)?

    Thanks for a suggestion

  • #2
    My short answer is No.

    1. Is this all the data? I wouldn't fit a model with 7 predictors to a dataset with only 21 observations. If this is all there is, only very simple models can be entertained.

    2. On the whole a model that can't be fitted easily is not a good model, and accepting a model before it has converged will not impress anyone more experienced.

    3. You need to simplify the model. For an utterly anonymous Y and a mix of economic and other predictors it is hard to make suggestions, but even with many more observations the usual reaction to a model not converging is to try something much simpler.

    4. The model may also be struggling for other reasons. It is common to find that income and wealth should be treated on logarithmic scale.

    Although it's a trivial detail affecting none of the above I am puzzled at holding an indicator variable as double.

    Comment


    • #3
      HI Nick,
      thanks for the remarks. The data is drawn from SHARE, a European survey on Health, Aging and Retirement. The data from wave 6 consists actually of about 43.000 observations (above is only an excerpt). Y is an indicator showing whether the household is invested in risky assets or not (1 vs. 0), while the other variables contain data on health indicators and other such risks as well as some control variables like the MS (marriage state).

      I tried it with probit Y local_income_PPPX local_wealth_PPPX birthyear MS EDU PHS_dummy FHR, iter(20)

      for 20 iterations but there is still something not really working as I get a note on convergence not being achieved.

      Probit regression Number of obs = 43,424
      LR chi2(6) = 2388.74
      Prob > chi2 = 0.0000
      Log likelihood = -16093.496 Pseudo R2 = 0.0691

      -----------------------------------------------------------------------------------
      Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      ------------------+----------------------------------------------------------------
      local_income_PPPX | -2.77e-07 1.83e-07 -1.52 0.130 -6.35e-07 8.13e-08
      local_wealth_PPPX | 5.55e-07 1.69e-08 32.81 0.000 5.22e-07 5.89e-07
      birthyear | -.00259 .0008659 -2.99 0.003 -.0042871 -.0008929
      EDU | .5050002 .0214513 23.54 0.000 .4629564 .547044
      OHS | .2522709 .0874518 2.88 0.004 .0808685 .4236734
      FHR | -.7113694 .040149 -17.72 0.000 -.7900601 -.6326787
      _cons | 3.593199 1.68631 2.13 0.033 .2880913 6.898306
      -----------------------------------------------------------------------------------
      Note: 8 failures and 0 successes completely determined.
      Warning: convergence not achieved


      Methodology is drawn from an article in Banking & Finance with some slightly modifications. Some of the results fit what the authors did on an earlier dataset but others are quite different and I am wondering whether it is due to the STATA non-convergence issue.

      Comment


      • #4
        the authors used income/wealth similar (but with 1.000 Euro) plus income or wealth squared.

        EDU is 1 in case of higher education vs. 0 (lower)

        Comment


        • #5
          So, my point #1 does not bite. Otherwise I think my comments are the same. Low explanatory power seems characteristic of work in this field, and for good reasons. People who are closer to what you do may have other suggestions.

          See also https://www.statalist.org/forums/help#spelling

          Comment


          • #6
            Nevertheless thanks for the input.

            I absolutely agree on the low explanatory power within some of these publications.

            Comment


            • #7
              Michael: In my experience, it's pretty hard to get an ordinary bivariate probit not to converge if there are no data issues. Eyeballing your estimates (please in future use CODE delimiters as per FAQ!), it looks as if the scaling of "local_income_PPPX" and "local_wealth_PPPX" should be looked into. Your coefficient estimates are tiny tiny (ditto associated SEs) and I suspect precision issues (and maybe collinearity?) may be causing problems. In what units have you got income and wealth?

              Comment


              • #8
                Thanks Stephen.

                local income and local wealth are measured in Euro and adjusted for purchasing parity (like a Euro in Poland buys more than in Switzerland).

                Scaling: the authors did the same: "Monetary amounts are PPP-adjusted and in 1000 Euros." (in my case in Euro not 1000 Euro)
                Last edited by Michael Craig; 29 Apr 2019, 02:53. Reason: typo

                Comment


                • #9
                  Sometimes adding the -difficult- option will work miracles.

                  My general advice in situations like this is to start small (e.g. only one independent variable) and build up. You may be able to identify a variable that is causing you grief. You can then try to figure out why and what to do about it.

                  Stata does 16,000 iterations by default, which is kind of absurd. If your 20 is too little you can increase it. But limiting the # of iterations will not get you the correct answer, it will just help you to get an incorrect answer more quickly.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    It is probably a good step-by-step approach. I will give it a try. Thanks for the input.

                    Comment


                    • #11
                      Stata does 16,000 iterations by default, which is kind of absurd.
                      Richard didn't mean quite what he said. 16,000 is Stata's default maximum -- it is delighted to report convergence as soon as it can -- and it is a deliberately absurd maximum. If your model really needs 16,000 iterations, Stata tried hard but you're trying too hard, although whether the model or the data or the researcher is at fault is not so clear.



                      Comment


                      • #12
                        Thanks to Nick Cox for the clarification. I wonder if there has ever been a model that converged on iteration 15,987? I usually opt to not show the iteration log but it can be helpful if you are trying to decide whether to stop a job.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment

                        Working...
                        X