I'm wanting to get a better understanding and possibly some suggestions of why Newton-Raphson consistently yields somewhat different (better??) results than either bfgs or dfp in a maximum likelihood routine I've written. The issue, essentially, is the consequences of using nr or dfp or bfgs in -optimize_init_technique(S, "whatever")-
Context: I have a nonstandard mle estimation routine (i.e., can't work with ML) that I implemented with -optimize() - and a d1 evaluator. It works pretty well, I think, but I'm trying to speed it up, and I discovered that for larger problems, Newton-Raphson takes 50 or 100 times longer bfgs or dfp. (The number of parameters being estimated = _N, so that methods that use the Hessian are expensive.) Using bfgs or dfp yields parameter estimates that are similar to each other and are on average within +/- 1% of what nr gives. The final LL from nr is consistently somewhat larger than what bfgs or dfp gets.
Question: There really isn't any external criterion for the accuracy of the parameter estimates in my situation. Is Newton-Raphson to be regarded as the gold standard of accuracy, even if it is slower? Might there be any ways to tweak the work of -optimize()- to get results from bfgs or dfp that are more like those from nr? I've tried diddling the parameter tolerance as a convergence criterion (the one that matters in this case), but that didn't change anything.
On the more practical side, my thinking is that any data I have for this routine (categorical attitude responses from humans) is much less accurate and precise than the +/- 1% in variation in the parameter estimates, so I suppose there's no compelling reason to prefer nr | bfgs | dfp.
I realize this is a pretty technical question, but please don't mistake me for someone who has much theoretical knowledge about optimization techniques, but I'm hoping this happens to fall in someone's area of expertise.
Context: I have a nonstandard mle estimation routine (i.e., can't work with ML) that I implemented with -optimize() - and a d1 evaluator. It works pretty well, I think, but I'm trying to speed it up, and I discovered that for larger problems, Newton-Raphson takes 50 or 100 times longer bfgs or dfp. (The number of parameters being estimated = _N, so that methods that use the Hessian are expensive.) Using bfgs or dfp yields parameter estimates that are similar to each other and are on average within +/- 1% of what nr gives. The final LL from nr is consistently somewhat larger than what bfgs or dfp gets.
Question: There really isn't any external criterion for the accuracy of the parameter estimates in my situation. Is Newton-Raphson to be regarded as the gold standard of accuracy, even if it is slower? Might there be any ways to tweak the work of -optimize()- to get results from bfgs or dfp that are more like those from nr? I've tried diddling the parameter tolerance as a convergence criterion (the one that matters in this case), but that didn't change anything.
On the more practical side, my thinking is that any data I have for this routine (categorical attitude responses from humans) is much less accurate and precise than the +/- 1% in variation in the parameter estimates, so I suppose there's no compelling reason to prefer nr | bfgs | dfp.
I realize this is a pretty technical question, but please don't mistake me for someone who has much theoretical knowledge about optimization techniques, but I'm hoping this happens to fall in someone's area of expertise.
Comment