Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Restrictions at specific points in the regression

    Hi, I want to fit a polynomial model (>4 degrees) to the data. However, I need to impose some restrictions on the fitted model. In my data, the independent variable (x) moves continuously between 0 and 2. I need to impose two restrictions: (i) f(0)=0, and (ii) f(2)=0, where f() is the estimated polynomial function.
    In stata, when estimating the model with regress command, the first restriction can be achieved with the nocons option. But how can I impose the second restriction when using the regress command?

    Thanks in advance for your suggestions.
    Regards,
    Jorge

  • #2
    So if you are fitting a polynomial model and we have a constraint that f(0) = 0 and f(2) = 0, then it follows that both x and x-2 are factors of this polynomial. So use -nl- to fit a polynomial that explicitly contains factors x and x-2. Something like this:
    Code:
    nl (y = (x)*(x-2)*({c3}*x + {c4}*x*x + {c5}*x*x*x)), initial(c3 1 c4 1 c5 1)
    predict yhat
    graph twoway line yhat y x, sort
    That will give you a fifth degree polynomial that is constrained to having roots at 0 and 2. You can do higher degree if you like by adding more terms to the model.

    Now, this is not using the -regress- command. However, if you were to multiply it out and re-arrange terms, you would see that the expression on the right hand side here is equivalent to the way you would express a fifth degree polynomial in -regress-, and you would also see that the 6 coefficients (including the constant term that would appear in regress) are linear functions of c3, c4, and c5. So this approach is equivalent to using -regress y c.x##c.x##c.x##c.x##c.x- with some linear constraints imposed on the coefficients. If you are ambitious and enjoy solving systems of linear equations, you could even spell out what those linear constraints are and implement it that way. But I don't recommend doing that because it's tedious and error-prone. Doing it with -nl- is both easier, and also makes it clear what the point of the calculations is, whereas the constraints you would have to use with -regress- would be rather opaque.

    By the way, you can't actually specify constraints with -regress-. You have to use -cnsreg- instead to do that--although -cnsreg- is nothing other than -regress- with constraints.

    Comment


    • #3
      Some additional thoughts:

      I don't know what your purpose in doing this ultimately is. But assuming you are looking to get a good fit of the polynomial model to the data, then actually constraining the polynomial to have roots at x = 0 and 2 probably isn't very important. If the data themselves have y close to 0 at x = 0 and x = 2, an unconstrained polynomial regression will probably turn out to have roots, if not exactly at 0 and 2, pretty close to that. A more important concern when fitting polynomials to data is the degree. You have said only that the degree is > 4. Each degree in the polynomial forces additional "wiggling" in the polynomial, and the absence of higher degree terms forces the polynomial to "stop wiggling" at lower values of |x|. If the actual data generating process is quadratic but you fit a 5th degree polynomial to the data, even if you do constrain it to be exactly right at x = 0 and x = 2, the fitted polynomial will "wiggle" excessively, and may not fit the data overall as well as an unconstrained fifth-degree polynomial regression.

      I've experimented with the approach I suggested in #2 and just fitting an unconstrained fifth-degree polynomial under some different data circumstances, and in general the unconstrained fifth-degree polynomial provides a somewhat closer fit to the data overall, and also ends up meeting the slightly relaxed requirement for roots close to 0 and 2 if the data really support that condition. But the situation is worse if the real data generating process is of a different degree than the fitted polynomial.

      If the data generating process truly is a polynomial of the same degree as you are fitting, and if the signal to noise ratio is high, then both approaches produce the very same curve, at least as far as one can see in a graph, and both fit very well.

      Also, I noticed an error in the code I posted in #2. It should be
      Code:
      nl (y = (x)*(x-2)*({c2} + {c3}*x + {c4}*x*x + {c5}*x*x*x)), initial(c2 1 c3 1 c4 1 c5 1)
      The addition {c2} term is necessary so that we impose the f(0) = 0 & f(2) = 0 constraints without inadvertently imposing another. An unconstrainted quintic polynomial has 6 coefficients, so 6 df for fitting. If we then require roots at x = 0 and 2, we have two constraints on the coefficients, which leaves four degrees of freedom, hence the need for {c2}.


      Finally, I'll also note that there are very few real-world data generating processes that are polynomial at all. Unless you're dealing with one of those, you are skating on thin ice fitting polynomials, and usually polynomial models are only used to proxy some kind of curvilinear shape.
      Last edited by Clyde Schechter; 27 Sep 2024, 21:25.

      Comment

      Working...
      X