Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • npregress (kernel and series) takes too long to compute

    Hello

    Version: Latest (MP)

    I am trying to run a non-parametric regression on my data but it takes forever to compute the results. I have about 500.000 observations.
    If I use npregress kernel and only use 3% of the sample then I get results within a few minutes. If I use 10% then it already takes more than half an hour. I don't know how long it takes because never let it finish.

    I have 8 dependent variables and did not compute standard errors. (code: npregress kernel agemom dum rosla $xzmob2 $xqvar). A person did also observe that npregress is slow with large datasets but small samples. https://www.statalist.org/forums/for...-small-samples
    However, even if I drop all missing variables and keep only the needed sample, npregress is still very slow. Any way I can solve this problem?

  • #2
    Hi Albion,
    There isnt anyway of going around that time problem. npregress are nonparametric models, which basically mean they need to estimate a large number of models to find the appropriate one (in terms of balancing bias and variance).
    Unfortunately, even though nonparametric methods require a lot of data for inference, they are also quite slow with lots of data. The only alternatives I can think of is to reduce the model complexity, and rely in a more parametric approach.
    Perhaps the main point to consider...do you really need a fully nonparametric model? or a parametric model (Linear regression) would do just as well.
    HTH
    F

    Comment


    • #3
      Thank you for the answer, Fernando. The parametric model is fine most likely, I just wanted to make sure that the results were robust. Either way, you saved me quite a bit of time so thank you again.

      Comment

      Working...
      X