Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sudden r(430) error--cannot compute an improvement

    I am running a two stage hurdle model with the help of community contributed nehurdle command in Stata12. Everything was working fine till this afternoon but now when I run the command with the same variables and data, I get an r(430) error. I tried using just one repressor, as a diagnostic, but still get the same error. I have also tried using a different version of Stata (v11) but to no avail. Please help.

    Example data and the command I am running is given below

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(total_member child_age child_gender tuition) int tuition_amount
     4 12 1 .   .
     4  8 2 2   0
     4 10 1 2   0
     4  8 1 2   0
     6 14 1 2   0
     6  9 2 2   0
     5 11 1 2   0
     5  8 2 2   0
     5 12 1 2   0
     4 10 1 2   0
     4  8 2 2   0
     3 14 1 2   0
     5 14 2 2   0
     5 13 1 2   0
     4 15 1 2   0
     4 10 1 2   0
     4 12 2 2   0
     5  9 2 2   0
     5  8 2 2   0
     5 13 2 2   0
     8  5 2 2   0
     8  3 1 .   .
     7 10 2 2   0
     7 13 2 2   0
     5  7 1 2   0
     5  6 2 2   0
     5  8 2 2   0
     7 11 2 2   0
     7  4 2 .   .
     7  9 2 2   0
     8 13 2 2   0
     8 11 1 2   0
     8 13 2 2   0
     4 13 2 2   0
     4 15 1 2   0
     5 14 2 2   0
     5 15 1 2   0
     4 13 2 2   0
     3 16 1 2   0
     4 13 1 2   0
     3  . . .   .
     2  . . .   .
     5  . . .   .
     6 16 2 2   0
     4  5 2 2   0
     5  5 2 2   0
     6  5 1 2   0
     2  . . .   .
     3  . . .   .
     5  . . .   .
     4  9 1 1 300
     4 10 1 1 300
     3 16 2 2   0
     4  7 2 1 300
     4  5 1 1 300
     4 11 2 1 260
     4 11 2 2   0
     2  . . .   .
     3 11 1 2   0
     3 11 2 2   0
     3  . . .   .
     4  . . .   .
     2  . . .   .
     5  6 1 2   0
     5  5 1 2   0
     5  3 1 2   0
     4  . . .   .
    12 11 2 .   .
    12 13 2 2   0
    12  8 2 .   .
    12  6 1 .   .
    12 12 2 2   0
    12 14 1 2   0
     2  . . .   .
     8  7 2 2   0
     3  . . .   .
     6 10 1 .   .
     6  3 1 .   .
     6  6 1 .   .
     4  . . .   .
     5  3 2 2   0
     5  5 1 2   0
     3  . . .   .
     3  3 1 .   .
     .  . . .   .
     4  . . .   .
     3  . . .   .
     4  . . .   .
     2  . . .   .
     7  5 2 .   .
     2  . . .   .
     4 10 1 .   .
     4 14 1 .   .
     9  9 2 .   .
     9  8 2 .   .
     4  4 2 2   0
     5 14 2 2   0
     5  6 1 2   0
     5  4 2 2   0
     5 10 2 2   0
    end
    
    replace tuition_amount=0 if tuition==2
    nehurdle tuition_amount child_age child_gender, trunc select(child_gender)
    Last edited by Parul Gupta; 10 Jul 2020, 10:39.

  • #2
    Just worked for me in Stata 16.1. On the other hand, your real dataset may be bigger but also more problematic than what you show here.

    Note the scope to specify ml_options such as difficult;.

    You're asked to say where community-contributed commands come from: in this case


    SJ-19-1 st0550 . . . . Estimation methods in the presence of corner solutions
    . . . . . . . . . . . . . . . . . . . . . . . . . A. Sanchez-Penalver
    (help nehurdle, nehurdle postestimation, nehtests if installed)
    Q1/19 SJ 19(1):87--111
    provides maximum likelihood estimators for linear, exponential,
    homoskedastic, and hetroskedastic tobit; truncated hurdle; and
    type II tobit models that involve explained variables with
    corner solutions


    Comment


    • #3
      Yes, it is working with a smaller dataset, but my actual model needs to run on 400,000 observations. I simply can't understand what bug crept in within 4 hours. Is there a way to 'reset' Stata12? I have tried uninstalling and reinstalling nehurdle from ssc.
      Anyway, I'll use the difficult option and report back. Thank you.

      Comment


      • #4
        Update: it is not working with the difficult option either. As a diagnostic, I used the same variables for logit model (which also uses MLE), and it is working, so doesn't sound like an issue with the variables involved.

        Comment


        • #5
          The logic isn't convincing here. Data can be fine for one model and very awkward to fit to another model. Also, if there is a bug in code it doesn't get added while the code is running; the code is laid down in advance like a railway track in front of a train. The code won't be affected by your running it, so reinstallation will be harmless but ineffective.,

          Unfortunately, I can't suggest better ideas. The program author should be better able to advise. But it seems that, as it were, this is a family of models designed for awkward data. You have lots of missing values. In your data example, 36 / 100 observations are omitted. Is it a similar fraction for the full dataset?

          Comment


          • #6
            Thanks for the reply. I have contacted the program author as well, hoping for a resolution.

            To answer your question, yes, the dependent variable has a lot of zeros (around 70% of the sample), which is one of the reasons to use this model. There are some missing observations for some regressors as well. These are valid concerns, but the model was running on this exact data with the same variables until yesterday, hence I am a total loss at understanding what's happened in the interim. Does Stata "misbehave" while using global macros? I have gone back and checked the variable names as well just to ensure that there is no conflict with system variables. Can't think of other possible sources of a bug entering the code.

            Comment


            • #7
              Again, bugs don't enter Stata code in the way that a person can catch an infection or a computer can be affected by malware. The problem is that an iterative calculation can fail to find an optimum in an iterative calculation just like someone in hilly country in mist can miss the summit they are trying to reach.

              Global macros can be dangerous, but you would have to explain how you are using them or how the program uses them to get a specific comment.
              None is visible in your code in #1

              A competent program won't use any of your variable names for its own purposes.

              Otherwise it is hard to be more precise (or less banal) than advice we already give in our FAQ:

              A model may not converge or fit well because it doesn't suit the data, or if you prefer the data don't suit the model. It can be very hard to advise on such cases, especially if presented generally.

              Comment


              • #8
                Yes, software is not human and can't 'catch' an infection, but when a command successfully used a day earlier stops working all of a sudden, it sounds like a bug. To reiterate, the same command with same options and same variables was working on the same dataset (600,000+) observations yesterday.

                I aimed to provides a minimum working example, hence didn't include all the global macros in my post. I am using child controls, household controls and community controls in my model, and have used global macros to store them:
                Code:
                global student "i.child_gender i.govt i.primary i.middle"
                I have a couple of such macros, and one macro has another macro nested inside it:

                Code:
                global participation "wealth $student"
                Can the factor notation be causing a problem?
                Last edited by Parul Gupta; 11 Jul 2020, 02:07.

                Comment


                • #9
                  it sounds like a bug
                  Not to me. That's what can happen with iterative calculations. They can go on and on and then stop because the algorithm can't make any further progress. This is explained at various levels in the manual and in the Stata Press book on maximum likelihood fitting.

                  I wish I could help more, but that is far as I can go without just restating the same broad ideas, which are standard. Your dataset is too big to post here and your minimum working example fails to show the problem because it works fine, so you are caught between a rock and a hard place. I am confident on your behalf that using globals has nothing to do with the problem.

                  Comment


                  • #10
                    1) Try looking into the ml_options for the nehurdle model. You might consider (a) reducing the maximum number of iterations, (b) specifying the difficult option (I know you tried this), (c) specifying a different technique (i.e., Newton-Raphson (default);Brendt-Hall-Hall-Hausman; Davidon-Fletcher-Powell; and Broyden-Fletcher-Goldfarb-Shanno.)

                    2) If the model worked yesterday, can you think of anything that has changed? If possible, look back into your Stata history to see what, if anything, is different.

                    Comment


                    • #11
                      Thank you Chris and Nick, I will relook the specification and/or ml options. I can't seem to spot what changed between the time model was converging and when it stopped doing so.

                      Comment

                      Working...
                      X