Testing whether to include a squared term

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17561
#31

01 May 2020, 07:18

Sohie:
I agree with you: no squared term for age is necessary.

Kind regards,
Carlo
(StataNow 18.5)
Comment
sophie maene

Join Date: Mar 2020

Posts: 12
#32

01 May 2020, 07:56

Thank you very much!
Comment
Latoya Sundack

Join Date: Jul 2019

Posts: 67
#33

22 May 2020, 06:55

Dear Statalist,

I am working with three rounds or waves of MICS unicef data. It is a cross sectional dataset and I would like to use survey round fixed effects. I have generated a wave variable to identify each of the three waves/rounds.

gen wave=1 if year <=2000

gen wave=2 if year <=2006

gen wave=2 if year <=2014

Basically, these codes are created a count of the total for each round/wave.

However, when I ran the regression

reg Wazs i.year i.month i.wave, robust cluster (hv001)

, the wave fixed effects are all omitted due to collinearity.

Thank you for your reply,

Kind Regards!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9898
#34

22 May 2020, 07:47

gen wave=1 if year <=2000
gen wave=2 if year <=2006
gen wave=2 if year <=2014

Not sure what you are doing here. The second command overwrites the first and the third overwrites the second. But Stata will not allow you to create two or more variables with the same name in the first place, so this cannot be the actual code that you ran. If the last year in the sample is 2014, your last command creates a variable equal to 1 for all observations. This will be collinear with the constant term in the regression.
Comment
Latoya Sundack

Join Date: Jul 2019

Posts: 67
#35

22 May 2020, 08:11

Dear Andrew,

Thanks. I have created these in the given round. However, I see what you mean with all observations being equal. Do you have any suggestion on how this can me done?
Thank you.

Regards!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9898
#36

22 May 2020, 09:43

Code:

gen wave= cond(year <=2000, 1, cond(inrange(year, 2001, 2006), 2, 3))
1 like
Comment
Latoya Sundack

Join Date: Jul 2019

Posts: 67
#37

22 May 2020, 14:16

Dear Andrew,

Thanks for the code. However, I got the same results as with the codes I used before, i.e., omitted due to collinearity. Please, do you have any other suggestions?

Thanks.

Regards!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9898
#38

22 May 2020, 15:42

reg Wazs i.year i.month i.wave, robust cluster (hv001)

You have to choose between month, year and wave dummies. You can not have more than one of the three as these are collinear. In short, by including month effects, you have accounted for year effects and wave effects as months are nested in years which in turn are nested in waves.
1 like
Comment
Latoya Sundack

Join Date: Jul 2019

Posts: 67
#39

23 May 2020, 05:06

Dear Andrew,

Thanks so much. It worked.

I have one more question. I have a total of 5600 observations for children who were able to get vaccinated (5000 yes, 600 no), however when generating the dummy for vaccines, it is capturing missing values for the entire dataset. I have a total of 7000 Wazs observations, as such the dummy is assuming 2000 no (i.e the 600 and the 1400 missing observations) . Is there a way to generate the dummy to only take into consideration the 5600 vaccinated observations.

gen Vacc =1 if Immu==1

replace Vacc=0 if mi(Vacc)

Thanks for your help. I realised this is not a message for this topic. However, I am grateful for your help with this.

Kind Regards!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9898
#40

23 May 2020, 08:34

Hard to say without a data example. However, if you simply want a 1/0/missing variable

Code:

gen Vacc = cond(Immu==1, 1, 0) if !missing(Immu)| !missing(Wazs)

If this does not work, provide a data example using dataex.
1 like
Comment
Latoya Sundack

Join Date: Jul 2019

Posts: 67
#41

24 May 2020, 07:14

Dear Andrew,

Thank you very much for your reply.

Regards!
Comment
Jovana Ju

Join Date: Aug 2020

Posts: 23
#42

11 Dec 2024, 14:36

Hello everyone and thank you in advance for your help!

According to theoretical foundations, my intention was to include a quadratic function in the analysis. It turns out that the coefficients in front of X and the quadratic X are statistically significant and indicate the existence of an inverted U relationship. I also applied a utest that supports this conclusion. However, what creates a dilemma for me is that only a few (4 out of 560) values of the variable X are above the turning point. When I run the linear regression, the coefficient in front of X is not statistically significant. How should I interpret this?

King regards!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17561
#43

15 Dec 2024, 06:43

Jovana:
as per FAQ, please share what you typed and what Stata gave you back via CODE delimiters. Thanks,
That said, as the turning point falls in between the range of your X variable, I would simply accept the quadratic relationship.

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2078
#44

15 Dec 2024, 07:18

Jovana: As Carlo said, it’s hard to tell without knowing specifics. But I suspect X = 0 is either not possible or it’s an extreme value in the range of X (maybe the lowest possible value). The coefficient on X measures the effect at X = 0, and if X = 0 is impossible the coefficient is meaningless. So you can center X about an interesting value — usually its sample average — before squaring it.

Also, one usually includes X^2 to capture a diminishing (or increasing) effect, regardless of whether there are observations on both sides of the turning point. In fact, sometimes it makes little sense to have values to the right of the turning point.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35097
#45

15 Dec 2024, 08:47

In ecology it's common that abundance is highest where organisms are happiest (evidently a term of art in gardening), corresponding to ideal temperature, moisture. salinity, nutrient supply, whatever), although competition, predation and other effects may be at work too.

A standard model for this phenomenon is so-called Gaussian logit, a combination of a quadratic in one predictor and logit link. Here is a simple graph to give flavour:

Code:

twoway function invlogit(0.01 * (x - 5) - (x - 5)^2), ra(0 10)

As Jeff Wooldridge implies, this model doesn't imply that the entire shape is needed. Indeed, for organisms that thrive at some environmental extreme, only one limb of the bell is needed.

I'd be interested to know how far this model is used outside ecology, in epidemiology, economics, eschatology, campanology, or anywhere else.
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment