Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term?

    Hi all,

    I used cross-sectional data and poisson regression. I did stepwise regression but found the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term. How comes it happen? And how do I solve this problem?

    Thanks,
    David

    Code:
    poisson income iv1t iv2 cv1 cv2 cv3 cv4 i.indcode, vce(robust)
    poisson income iv1t iv2 c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode, vce(robust)
    poisson income c.iv1t##c.iv2 c.iv1t#c.iv1t c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode, vce(robust)
    Reuslts:
    Click image for larger version

Name:	16年05月23日2244_1.png
Views:	1
Size:	11.2 KB
ID:	1342226

    Click image for larger version

Name:	16年05月23日2246_1.png
Views:	1
Size:	12.7 KB
ID:	1342227

    Click image for larger version

Name:	16年05月23日2247_1.png
Views:	1
Size:	14.1 KB
ID:	1342228




  • #2
    First, in a quadratic model, the statistical significance of the linear and quadratic terms should never be looked at alone. At best the joint significance of both coefficients is meaningful; either one by itself is not.

    But in your final model, which also includes an interaction between iv1t and the linear term for iv2, you will notice that the coefficient of iv2 has barely changed at all. The standard error has gone up a little bit. You have gone slightly significant to almost significant. Nothing to write home about. Not even worth looking at, really. Nothing much has changed numerically. The quadratic iv2 coefficient has changed appreciably.

    Now, all of that said, none of it matters. You are comparing apples to walnuts here. In the presence of the interaction iv1t#iv2, neither the iv2 term nor the iv2^2 term means the same thing that it does in the model without the interaction term. There is no reason, actually, to expect them to be the same, or even similar. With the interaction term, he iv2 and iv2#iv2 coefficients give you the quadratic representation of the iv2 effect conditional on iv1 = 0. If 0 is an important value of iv1, then perhaps this is of some interest. If, as is often the case, 0 isn't even within the range of observed values of iv1, then it's just a huge red herring. Remember that by putting the interaction with iv1 in you are no longer modeling a single quadratic effect of iv2. You are modeling an effect of iv2 that is a different quadratic for each value of iv1. And so it really only makes sense to talk about the effects of iv2 at specific, well-chosen values of iv1. To see what is going on in your model, it is best to use the -margins- command, followed by -marginsplot-.

    In quadratic models, it is really pointless to directly interpret the coefficients of the linear and quadratic terms by themselves. The quadratic coefficient's sign tells you if you are dealing with an upright or upside down U relationship, and its magnitude tells you how wide or flat the parabola is. The linear coefficient in its own right has no meaning whatsoever. (Well, it is the slope of the parabola at the point where iv2 = 0--which may or may not be a useful number depending on your context.) The real use of the linear coefficient is to calculate -linear coeff/(2*quadratic coeff). This gives the value of iv2 where the parabola reaches its peak or nadir. It is the location of the axis of symmetry of the parabola, and your model means rather different things if that value falls squarely inside the range of observed values of iv2 (in which case you really do have a U), or beyond that range, in which case you have a slightly curvilinear relationship, but no real U. When you put an interaction term with iv1 in there, you now have different parabolas for each value of iv1. The linear coefficient for the parabola at a given value of iv1 is _b[iv2] + _b[iv1#iv2]*iv1. So, as iv1 changes, that linear coefficient changes with it (in a linear way) and therefore the center of the parabola moves as well. (Things would be more complicated still if you also interacted iv1 with iv2#iv2.)

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      First, in a quadratic model, the statistical significance of the linear and quadratic terms should never be looked at alone. At best the joint significance of both coefficients is meaningful; either one by itself is not.

      But in your final model, which also includes an interaction between iv1t and the linear term for iv2, you will notice that the coefficient of iv2 has barely changed at all. The standard error has gone up a little bit. You have gone slightly significant to almost significant. Nothing to write home about. Not even worth looking at, really. Nothing much has changed numerically. The quadratic iv2 coefficient has changed appreciably.

      Now, all of that said, none of it matters. You are comparing apples to walnuts here. In the presence of the interaction iv1t#iv2, neither the iv2 term nor the iv2^2 term means the same thing that it does in the model without the interaction term. There is no reason, actually, to expect them to be the same, or even similar. With the interaction term, he iv2 and iv2#iv2 coefficients give you the quadratic representation of the iv2 effect conditional on iv1 = 0. If 0 is an important value of iv1, then perhaps this is of some interest. If, as is often the case, 0 isn't even within the range of observed values of iv1, then it's just a huge red herring. Remember that by putting the interaction with iv1 in you are no longer modeling a single quadratic effect of iv2. You are modeling an effect of iv2 that is a different quadratic for each value of iv1. And so it really only makes sense to talk about the effects of iv2 at specific, well-chosen values of iv1. To see what is going on in your model, it is best to use the -margins- command, followed by -marginsplot-.

      In quadratic models, it is really pointless to directly interpret the coefficients of the linear and quadratic terms by themselves. The quadratic coefficient's sign tells you if you are dealing with an upright or upside down U relationship, and its magnitude tells you how wide or flat the parabola is. The linear coefficient in its own right has no meaning whatsoever. (Well, it is the slope of the parabola at the point where iv2 = 0--which may or may not be a useful number depending on your context.) The real use of the linear coefficient is to calculate -linear coeff/(2*quadratic coeff). This gives the value of iv2 where the parabola reaches its peak or nadir. It is the location of the axis of symmetry of the parabola, and your model means rather different things if that value falls squarely inside the range of observed values of iv2 (in which case you really do have a U), or beyond that range, in which case you have a slightly curvilinear relationship, but no real U. When you put an interaction term with iv1 in there, you now have different parabolas for each value of iv1. The linear coefficient for the parabola at a given value of iv1 is _b[iv2] + _b[iv1#iv2]*iv1. So, as iv1 changes, that linear coefficient changes with it (in a linear way) and therefore the center of the parabola moves as well. (Things would be more complicated still if you also interacted iv1 with iv2#iv2.)
      Hi Clyde,

      Thanks for your explanation. I've attached the grahp below. In my models,I am modeling an effect of iv1 for each value of iv2, and talking about the effects of iv1 at specific, well-chosen values of iv2. I know the models are not the same. And I have to do the piecewise regression, so does it still make sense in this case (the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term)?

      Thanks,
      David

      Click image for larger version

Name:	16年05月24日0839_1.png
Views:	1
Size:	8.5 KB
ID:	1342274

      Comment


      • #4
        Those predictions don't look right. What is the unit of your income variable. If it is something like euros/dollars/pounds per year then a prediction of 1.50e+10 (=15,000,000,000) is just unrealistic. Quadratic models can easily lead to such extreme predictions. Are there many observations in your data with a large iv1t and iv2=0? I would start with just a scatter plot of income versus iv1t and get a feel for what is realistic. I would then try different ways of including non-linearity, e.g. also adding the interaction term with iv1t and the squared term, splines (help mkspline) to check what is going on.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          Those predictions don't look right. What is the unit of your income variable. If it is something like euros/dollars/pounds per year then a prediction of 1.50e+10 (=15,000,000,000) is just unrealistic. Quadratic models can easily lead to such extreme predictions. Are there many observations in your data with a large iv1t and iv2=0? I would start with just a scatter plot of income versus iv1t and get a feel for what is realistic. I would then try different ways of including non-linearity, e.g. also adding the interaction term with iv1t and the squared term, splines (help mkspline) to check what is going on.
          Hi Maarten,

          Thanks for your suggestion. The income is a firm's income in yuan (not euro), and the maxium of the data is 4.90e+09. I am trying piecewise regression by including non-linearity, i.e. adding the the squared term of iv2 and think if it is significant, I'll then add the interaction term with iv1t. But unfortunately, only adding the the squared term of iv2 make the coefficient of iv2 and its square term insignificant. It would be much more clearer only report the result of the best model, but I want to show it in piecewise regression. In that case,do you think these piecewise regressions make sense by keeping adding the interaction term of iv1t in the final model even if the squared term of iv2 is not significant alone?

          Thanks,
          David
          Last edited by David Lu; 24 May 2016, 02:59.

          Comment


          • #6
            Clyde already explained that you don't look at the signficance of a variable and that variable squared in isolation.

            What do you mean with "piecewise regression": are you estimating seperate models for different groups? It does not surprise me that the significance is different in subsets of your data, if only because that way you reduce the sample size, but also because apperently you expect the results to be different across groups (why else would you want to estimate seperate models).
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              Clyde already explained that you don't look at the signficance of a variable and that variable squared in isolation.

              What do you mean with "piecewise regression": are you estimating seperate models for different groups? It does not surprise me that the significance is different in subsets of your data, if only because that way you reduce the sample size, but also because apperently you expect the results to be different across groups (why else would you want to estimate seperate models).
              Dear Maarten,

              What I mean piecewise regression is regression as the following example, in model 1 add only control variables, then model 2 iv1, model3 iv1 iv1sq, model4 iv1 iv1sq and iv1#iv2,since it's done step by step, so it's called stepwise or piecewise model. It doesn't reduce the sample size actually, and not compare results across groups. The purpose of using stepwise regression is to see if the Rsq/explanatory power increase significantly when adding variables.

              And based on your suggestion, I checked my data and found not many observations in your data with a large iv1t and iv2=0. Also, I started with just a scatter plot of income versus iv1t and got a feel for what is realistic (attached below). And I then tried different ways of including non-linearity, e.g. also adding the interaction term with iv1t and the squared term, splines (help mkspline) to check what is going on, the results seem to explode more. So, in that case, what do you think how to deal with the extreme predictions ?

              Thanks,
              David
              Click image for larger version

Name:	16年05月30日0936_1.png
Views:	1
Size:	67.5 KB
ID:	1343167

              Click image for larger version

Name:	Graph-scatter-test.png
Views:	1
Size:	22.8 KB
ID:	1343169

              Last edited by David Lu; 30 May 2016, 01:51.

              Comment


              • #8
                You have at least two obvious outliers. So I would check for influential observations. For example, if you estimate your model with glm instead of poisson, you can estimate the exact same model, but you can directly predict the Cook's distance after glm (see glm postestimation in the predict section). How to then deal with them is an art; I would be very reluctant to remove them from your analysis, as they do seem legitimate observations, instead my first attempt would be to try and find some factors that explains those extreme values and incorporate those in your model.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  You have at least two obvious outliers. So I would check for influential observations. For example, if you estimate your model with glm instead of poisson, you can estimate the exact same model, but you can directly predict the Cook's distance after glm (see glm postestimation in the predict section). How to then deal with them is an art; I would be very reluctant to remove them from your analysis, as they do seem legitimate observations, instead my first attempt would be to try and find some factors that explains those extreme values and incorporate those in your model.
                  Hi Maarten,

                  Thank you for your further suggestion. I used glm (command attached) and estimate the exact same model, but cannot directly predict the Cook's distance after glm as it cannot be used in vce(robust) term (error attached). Is there some alternative to predict Cook's distance of model using robust term ?

                  Thanks,
                  David

                  Code:
                  . glm income c.iv1t##c.iv2 c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode  ,family(poisson) link(log) vce(robust)
                  predict d1, cooksd
                  . predict d1, cooksd
                  standardized not allowed after robust estimation
                  r(198);

                  Comment


                  • #10
                    http://blog.stata.com/2014/05/08/usi...ential-points/
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      Hi Maarten,

                      Thank you for the helpful post. Now I can obtain the dfbeta values(attached), and I tried and found some factors that explains those extreme values and incorporate those in the model and it explode less but still extreme .Even worse, adding more variable make it less significant of the coefficients. So, how to then deal with the results?

                      Thanks,
                      David

                      Click image for larger version

Name:	cooksd-jacknife.png
Views:	1
Size:	40.7 KB
ID:	1343188

                      Comment


                      • #12
                        That is the art of model building. This is something that you need to do, as you know most about the situation, the data, the way the data was collected, the research question, the aim of your study, etc. etc. Just document what you tried, and be honest and be open to the possibility that your less then 300 observations just don't contain the information necessary to reliabaly answer the question you want to answer.
                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment


                        • #13
                          Originally posted by Maarten Buis View Post
                          You have at least two obvious outliers. So I would check for influential observations. For example, if you estimate your model with glm instead of poisson, you can estimate the exact same model, but you can directly predict the Cook's distance after glm (see glm postestimation in the predict section). How to then deal with them is an art; I would be very reluctant to remove them from your analysis, as they do seem legitimate observations, instead my first attempt would be to try and find some factors that explains those extreme values and incorporate those in your model.
                          Dear Maarten,

                          I remember last time you suggested to try and find some factors that explains those extreme values and incorporate those in the model. Just a bold guess, does it make sense to first tag those extreme values and creat a dummy variable and incorporate it in the model to control the effect of outliers ?

                          Thanks,
                          David

                          Comment


                          • #14
                            David:
                            First, I would check whether those "weird" values are due to a trivial error in data entry.
                            If that is not the case, it would probably be wiser to present the result of your regression with and without extreme observations.
                            Another option is -rreg- but it seems less good-looking than in the past.

                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment

                            Working...
                            X