Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error message "Insufficient observations" for regression

    Hello everybody,

    I am in the process of building a regression model with a DV (q_tot), an IV (rdalliances), a moderators(munificence), and some controls which looks like this:
    Code:
    xtreg q_tot c.rdalliances##c.rdalliances##c.munificence rdi_w adi_w ln_emp1_w lev i.fyear, re vce(robust)
    I have tested this with the UTEST command for an (inverted) U-shape and got a significant result of 0.00 out of it:
    Code:
    .         utest rdalliances rdalliances_2
    (107,260 missing values generated)
    
    Specification: f(x)=x^2
    Extreme point:  25.46572
    
    Test:
         H1: Inverse U shape
     vs. H0: Monotone or U shape
    
    -------------------------------------------------
                     |   Lower bound      Upper bound
    -----------------+-------------------------------
    Interval         |           0               57
    Slope            |    .3105406        -.3845433
    t-value          |    3.356228        -2.706786
    P>|t|            |    .0003952         .0033975
    -------------------------------------------------
    
    Overall test of presence of a Inverse U shape:
         t-value =      2.71
         P>|t|   =     .0034
    Now I would like to split my data at the extreme point (25.466) based on turning point and check slopes of two linear regressions as described in the paper by Haans et al. (2016) "THINKING ABOUT U: THEORIZING AND TESTING U- AND INVERTED U-SHAPED RELATIONSHIPS IN STRATEGY RESEARCH" to check for robustness of my inverted U-shape.

    For this I use the following code:
    Code:
        xtreg q_tot rdalliances rdi_w adi_w ln_emp1_w lev i.fyear if rdalliances <=  24.64405 , re vce(robust)
        estimates store model_linear_RDA_split1
        xtreg q_tot rdalliances rdi_w adi_w ln_emp1_w lev i.fyear if rdalliances > 24.64405 , re vce(robust)
        estimates store model_linear_RDA_split2
    The first regression with the first half of the data works, but I always get the error message "Insufficient observations" for the second regression. I think a possible reason might be that the data set only contains 10 values above 25 for rdalliances, out of a total of more than 110.000 observations.

    Does someone know, why this message appears and what I can do to get the second regression results?

    Thanks in advance,
    Hanna
    Last edited by Hanna Marie; 08 Jan 2022, 14:31.

  • #2
    How many observations for rdalliances are greater than 24.64405? Hanna Marie

    Comment


    • #3
      It may be that this turning point is quite close to the upper end of the range of rdalliances in the estimation sample, close enough that only a handful of observations meet that description. You can do something like -summ rdalliances if e(sample) & rdalliances > 24.64405- to see what's going on.

      If that is the case, then, -utest- notwithstanding, you don't really have strong evidence for a u-shaped relationship in the data. (For example, the log function is monotone increasing, so it never has a turning point. But if you use the -utest- approach with a quadratic term, it will tell you that you have U-shaped relationship. But one tipoff that that isn't quite right is that the turning point is near the upper end of the range of x's. You may have something like that going on here, with the turning point so close to the end of the data that there's hardly anything to its right.

      Added: Crossed with #2, which makes the same point.

      Comment


      • #4
        Note that this discussion builds on earlier discussions over the past week at

        https://www.statalist.org/forums/for...ith-moderation

        https://www.statalist.org/forums/for...r-relationship

        https://www.statalist.org/forums/for...and-moderators

        A common theme, prominent in the discussion from yesterday, is that without seeing more details, it is difficult to diagnose the problem.

        Comment


        • #5
          Hello,

          the regression data set only contains 10 values above 25 for rdalliances, out of a total of more than 96,000 observations.
          Might this be the reason why it is not possible to get a regression result for the second half of the inverse U-curve?

          Regards,
          Hanna

          Comment


          • #6
            Yes, of course it is.

            You have four control variables, plus up to 16 yearly fixed effects. You should not be surprised that 10 observations is not enough to estimate the model.

            In looking back over the previous topics you created around this analysis, nowhere did I find a description of your dependent variable q_tot. Your results suggest that the q_tot either has a few large outliers that a driving the utest and perhaps the xtreg results, or that in fact q_tot is highly skewed to the left.

            Perhaps if you were to explain what you are modeling, what q_tot and your control variables represent, and provide the output of
            Code:
            summarize q_tot, detail
            we might be in a better position to advise a way forward. But as it now stands, your utest results are likely meaningless, and your xtreg results are suspect as well.

            Comment

            Working...
            X