Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with regression

    Dear all,

    I have generated an age range variable and its quadratic. But because of collinearity I have to take another step. But when regressing the equation I got no observations.

    Any help I will appreciate. Many thanks. Below is my code.


    gen byte agegroup= inrange(age, 25, 65) if !missing(age)

    generate agegroupc = agegroup - r(mean)

    generate agegroupcsq = agegroupc * agegroupc

    regress agegroupc agegroupcsq female






  • #2
    Hi Rodrigo,
    Could you post a reproducible example? See the FAQ.

    Comment


    • #3
      Hi Iuri
      I would like to but I don't know how.
      My problem is the following: I have defined an age range, and I add a quadratic of this age range. This means collinearity. But if I solve this problem then in regressing I got no observations.

      Comment


      • #4
        your problems are much bigger than that, from the code you posted in #1, age is your outcome (dependent) variable and the the squared version is a predictor - this makes no sense;

        if you don't understand how to use -dataex-, then read (1) the FAQ and (2) the help file

        Comment


        • #5
          Hello Rodrigo. Here is a reproducible example of what you showed in #1, but with some added frequency tables etc.

          Code:
          clear
          use http://www.stata-press.com/data/r16/nhanes2.dta
          summarize age
          generate byte agegroup= inrange(age, 25, 65) if !missing(age)
          // I assume you wanted to mean-center agegroup
          quietly summarize agegroup, meanonly
          generate agegroupc = agegroup - r(mean)
          generate agegroupcsq = agegroupc * agegroupc
          // Uncomment the next line if you need to install -fre-
          // ssc install fre  
          fre agegroup agegroupc agegroupcsq female
          regress agegroupc agegroupcsq female
          // Using ## rather than agegroupcsq
          regress agegroupc c.agegroupc#c.agegroupc female

          As the frequency tables show, your agegroup variables are all dichotomous. So it makes no sense to compute a quadratic term--it also ends up being dichotomous.

          The other problem is that you appear to have omitted the dependent variable from your -regress- command. As you showed it, agegroupc is the DV and agegroupcsq and female are the explanatory variables. But that model makes no sense.

          I think you should back up a bit and explain in more general terms what it is you are wanting to do.

          HTH.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Dear Bruce many thanks,

            I want to allow for non-linearity in a wage-equation, where wage is the Y variable. From what I understood, having age as categorical is a way itself to explain nonlinearity?

            Regards.

            Comment


            • #7
              Rodrigo:
              I would consider treating -age- as continuous, creating its squared term, add them both in the right-hand side of your regression equation and check whether a turning point (which would witness a non-linear relationship between -age- and the regresand -wage-) does exist.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Dear Carlo Lazzaro thank you for your point,

                However, in my case I need to define the 25-65 age range. So in this case I cannot have its quadratic form because an age range would always capture nonlinearities?

                Best regards.

                Comment


                • #9
                  Rodrigo:
                  it depends on what you've planned to do.
                  It you consider the variable obtrained after -inrange- as a categorical variable, its squared term does not make nay sense:
                  Code:
                  use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                  gen byte range_price= inrange(price, 1000, 4000) if !missing(price)
                  . regress gear_ratio i.range_price , vce(cluster foreign)
                  
                  Linear regression                               Number of obs     =         74
                                                                  F(0, 1)           =          .
                                                                  Prob > F          =          .
                                                                  R-squared         =     0.0269
                                                                  Root MSE          =     .45322
                  
                                                   (Std. err. adjusted for 2 clusters in foreign)
                  -------------------------------------------------------------------------------
                                |               Robust
                     gear_ratio | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  --------------+----------------------------------------------------------------
                  1.range_price |   .2089177   .0239656     8.72   0.073    -.0955946      .51343
                          _cons |    2.98381   .2870979    10.39   0.061    -.6641153    6.631734
                  -------------------------------------------------------------------------------
                  .
                  Coversely, if you use the very same variable as an argument for an -if- clause, while considering the original variable as a continuous predictor,the squared term of the latter makes sense:
                  Code:
                  use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                  gen byte range_price= inrange(price, 1000, 4000) if !missing(price)
                  . regress gear_ratio c.price##c.price if range_price==1, vce(cluster foreign)
                  
                  Linear regression                               Number of obs     =         11
                                                                  F(0, 1)           =          .
                                                                  Prob > F          =          .
                                                                  R-squared         =     0.1392
                                                                  Root MSE          =     .46293
                  
                                                     (Std. err. adjusted for 2 clusters in foreign)
                  ---------------------------------------------------------------------------------
                                  |               Robust
                       gear_ratio | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  ----------------+----------------------------------------------------------------
                            price |  -.0117343   .0107118    -1.10   0.471    -.1478405    .1243719
                                  |
                  c.price#c.price |   1.70e-06   1.39e-06     1.22   0.438     -.000016    .0000194
                                  |
                            _cons |   23.22404    20.1889     1.15   0.456    -233.3003    279.7484
                  ---------------------------------------------------------------------------------
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Let's go back to basics.

                    In post #1 you ask why your regression results have no observations. Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data. Show your code. Copy your commands and results from Stata's Results window. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

                    Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

                    Added in edit: this crossed with Carlo's post #9. Perhaps on reading it you now understand what you are being advised to do. In particular, his first point is that if you have a 0/1 categorical variable, its square is identical, because 02==0 and 12==1, so adding a squared term makes no sense. And in your first post you subtract some sort of mean from a categorical variable, which does not magically turn it back into a continuous variable. It still has just two values, so its square will be collinear with its value.

                    If not, do please back up to the beginning as I recommend here. We cannot understand your problem from what little information you have given us.
                    Last edited by William Lisowski; 14 Mar 2022, 11:41.

                    Comment


                    • #11
                      Rodrigo Soares In addition to the recommendations from the other members of the list, I would recommend reading:
                      Cohen, P., Cohen, J., West, S. G., & Aiken, L. S. (2002). Applied multiple regression/correlation analysis for the behavioral sciences (3rd. ed.). Mahwah, NJ: Lawrence Erlbaum Associates. --> See ch 6 on squared predictors
                      Gardner, R. G., Harris, T. B., Li, N., Kirkman, B. L., & Mathieu, J. E. (2017). Understanding “it depends” in organizational research: A theory-based taxonomy, review, and future research agenda concerning interactive and quadratic relationships. Organizational Research Methods, 20(4), 610-638. doi:10.1177/1094428117708856 --> on how to calculate the turning point as suggested by @Carlo on #7.

                      Comment

                      Working...
                      X