Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrap error in Control Function: insufficient observations to compute bootstrap standard errors

    Dear Statalisters,

    I am trying to bootstrap the standard error in the Control Function:
    1) First stage OLS regression: $x = $control + $iv + i.ind_county_ym
    then predict the residuals to be vh
    2) 2nd stage Probit regression (Y is 0/1): $Y = $x + $control + vh + i.ind_county_ym

    Following is my code. It reports error “insufficient observations to compute bootstrap standard errors, no results will be saved”.

    Code:
     
    cap pr drop cre_cf
    pr cre_cf, rclass
    cap drop vh
     
    reghdfe $x $control $iv , absorb(ind_county_ym ) resid
    predict vh, resid
    return scalar b_iv = _b["$iv"]
    probit $Y $x $control vh i.ind_county_ym
    return scalar b_ppp = _b["$x"]
     
    drop vh
    end
     
    * START Bootstrap
    xtset, clear
    ereturn clear
    bootstrap r(b_iv) r(b_ppp), reps(2000) seed(123) cluster(ind_county_ym) nodrop : cre_cf
    I real other posts, had tried to add “nodrop” option, add “ereturn clear”.. none of them works.
    However:
    when I change the second stage to be OLS, that is “reghdfe $Y $x $control vh, absorb(ind_county_ym) “, it works ;
    Or, when I directly run probit without bootstrap, it works.


    Any idea what is going on? Thank you very much!


  • #2
    To start debugging this, add the -noisily- option to your -bootstrap- command, and turn down the reps to a small number, say 5, so you get only a manageable amount of output. Then you will be able to see what is happening in greater detail. Perhaps then it will become clear what the problem is.

    Added: If I had to make a guess what's going wrong, I'd speculate that your variable $Y is almost always 0 (or almost always 1). In that case, nearly all of your bootstrap samples will have $Y exclusively 0 (resp. exclusively 1)--which makes a -probit- regression impossible.
    Last edited by Clyde Schechter; 13 Mar 2024, 16:04.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      To start debugging this, add the -noisily- option to your -bootstrap- command, and turn down the reps to a small number, say 5, so you get only a manageable amount of output. Then you will be able to see what is happening in greater detail. Perhaps then it will become clear what the problem is.

      Added: If I had to make a guess what's going wrong, I'd speculate that your variable $Y is almost always 0 (or almost always 1). In that case, nearly all of your bootstrap samples will have $Y exclusively 0 (resp. exclusively 1)--which makes a -probit- regression impossible.
      Hi Clyde,

      Thank you so much for the very helpful suggestion! I tried -noisily- . It reports "convergence not achieved" in the probit. If possible, could you please kindly give some suggestion on convergence problem? Thank you so much!

      I think the problem is from "i.ind_county_ym": when I drop "i.ind_county_ym" and run "probit $Y $x $control vh", it can achieve convergence.
      My data is: I have 35,282 obs, the mean of $Y is 0.0134, I have 1 variable in $X, 1 variable in $IV, 10 variables in $control, and 75 distinct "ind_county_ym" (so 74 dummies would be estimated for "i.ind_county_ym") , within each ind_county_ym there are 100 to 2647 obs.

      Following are some of the iteration process:

      Code:
      probit $Y  $x $control vh i.ind_county_ym 
      
      Iteration 0:  Log likelihood = -2509.4067  (not concave)
      Iteration 1:  Log likelihood = -2447.2889  (not concave)
      Iteration 2:  Log likelihood = -2434.7419  (not concave)
      ...
      Iteration 299: Log likelihood = -2389.0423  (not concave)
      Iteration 300: Log likelihood = -2389.0395  (not concave)
      convergence not achieved
      Many thanks again!!

      Comment


      • #4
        Well, convergence problems are difficult to solve. They occasionally respond to tweaking some of the options controlling the estimation process itself. But in my experience, they usually require changing the model by removing some variables. In your case, that iteration log actually suggests that the process was still moving ahead, although slowly, and still in a non-concave region of the likelihood. I say that because even at the very end, the log likelihood still appears to be increasing. It's possible that had it been allowed to run longer, it would have eventually converged. So maybe specifying -iterate(500)- will get you there.

        Another approach to troubleshooting non-convergence is to re-run it specifying -iterate(100)-, which will cause probit to terminate estimation after 100 iterations, and show you interim results. These interim results are not valid, and you cannot use them for your analysis. But you may find clues in the output about variables that are making the estimation difficult. Clues include unreasonably large standard errors, or standard errors unreasonably close to zero, or outlandishly large or small coefficients. Removing such variables from the model can sometimes unblock things and let convergence happen.

        If you find no clues about problematic variables, then you just have to experiment with things. It's usually better to work the other way: start with the minimum possible model, -probit $Y $X. If that converges, add in one "control" variable at a time. If you're still on a roll, add in vh. It may be at some point along the way, adding that variable will lead to non-convergence. Then you can omit that variable and continue trying to add others, and so on, doing the best you can. Of course, you have already found that removing i.ind_country_ym unblocks the estimation--and that may be the solution to your problem, though I would imagine you would not be very happy with that.

        Comment


        • #5
          If you're just interested in effect sizes and not predictions, you can switch out probit for OLS. Or try Logit to see if will converge.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Well, convergence problems are difficult to solve. They occasionally respond to tweaking some of the options controlling the estimation process itself. But in my experience, they usually require changing the model by removing some variables. In your case, that iteration log actually suggests that the process was still moving ahead, although slowly, and still in a non-concave region of the likelihood. I say that because even at the very end, the log likelihood still appears to be increasing. It's possible that had it been allowed to run longer, it would have eventually converged. So maybe specifying -iterate(500)- will get you there.

            Another approach to troubleshooting non-convergence is to re-run it specifying -iterate(100)-, which will cause probit to terminate estimation after 100 iterations, and show you interim results. These interim results are not valid, and you cannot use them for your analysis. But you may find clues in the output about variables that are making the estimation difficult. Clues include unreasonably large standard errors, or standard errors unreasonably close to zero, or outlandishly large or small coefficients. Removing such variables from the model can sometimes unblock things and let convergence happen.

            If you find no clues about problematic variables, then you just have to experiment with things. It's usually better to work the other way: start with the minimum possible model, -probit $Y $X. If that converges, add in one "control" variable at a time. If you're still on a roll, add in vh. It may be at some point along the way, adding that variable will lead to non-convergence. Then you can omit that variable and continue trying to add others, and so on, doing the best you can. Of course, you have already found that removing i.ind_country_ym unblocks the estimation--and that may be the solution to your problem, though I would imagine you would not be very happy with that.
            Hi Clyde,

            Got it. A million thanks to your incredibly helpful suggestions! I greatly appreciate it!

            Comment


            • #7
              Originally posted by George Ford View Post
              If you're just interested in effect sizes and not predictions, you can switch out probit for OLS. Or try Logit to see if will converge.
              Hi George,

              It seems that probit might be more recommended in the setting of the Control Function, as its normality assumption would work better than logit in this setting. However, now that it does not converge, I will try logit as well. Thank you so much for the suggestion!

              Comment


              • #8
                as its normality assumption would work better than logit in this setting
                I think this kind of reasoning for choosing a link function is overrated. The -logit- and -normal- distributions are really very similar. They differ appreciably only in the far tails, far enough out that only very large samples even reach that territory. No, the difference between -probit- and -logit- regressions in most situations is pretty much reduced to a scale factor on the coefficients.

                Moreover, in your situation, you are trying to fit a "fixed effects probit" model--but what you are doing is an unconditional model, not a bona fide conditional fixed-effects regression. Unlike -probit-, the logistic regression has a true conditional fixed-effects implementation.

                If in your discipline it is conventional to use -probit- for this kind of work, I suppose it is reasonable to decide not to swim upstream. But from an abstract statistical perspective, it is hard to make a real case for preferring -probit- over -logit- here.

                Comment


                • #9
                  And, I suspect that a linear probability model will give you near exactly the same marginal effects as probit or logit.

                  Comment

                  Working...
                  X