Reporting on models with interaction terms

Clyde Schechter

Join Date: Apr 2014

Posts: 29955
#31

13 Jan 2022, 15:11

OK. Good luck moving forward. If at some time you want to return to this issue, just post back continuing the thread.
Comment
Nitin Jain

Join Date: Apr 2022

Posts: 65
#32

22 Aug 2023, 12:53

Originally posted by Clyde Schechter View Post

Re #19. This does look somewhat like a quadratic relationship, in that there is a clear turning point in the middle of the data. It is rather asymmetric, however, and at least for men, the relationship looks nearly flat in the lower range of military years in country. Of course, the -loess- command does not adjust for covariates, let alone account for hierarchy, so it makes sense to follow-up on this with Simonsohn's approach.

Which brings us to #20. Simonsohn's article is not easy to read; it took me several attempts before I felt I understood it. What you did is correct in some ways, but not in others. Here, I'll walk through the approach, first in words, and then with pseudo-code. I'm using pseudo-code here because I don't have a suitable data set to work with to write actual code that will work with your data, and I don't want to get bogged down in things like correctly typing the names of every variable and paying attention to peculiarities of syntax of particular commands. My goal here is to give you a good start in the direction of writing the code yourself by filling in the details.

For simplicity I'm going to denote the outcome variable non-electoral participation as Y, and the key independent variable (which is state_leg_fsi for one analysis and civpt_mn_mil_yrs for another) as X. You should do this separately for each of the different X's, and also separately for men and women. (So at the top level you will loop over varlist state_leg_fsi civpt_mn_mil_yrs, and within that over values of female.) Within those loops (or you can just make 4 copies of the code and substitute the appropriate variables/values of sex if that's easier) you have to carry out several steps.

Step 1. Identify an approximate turning point for the X:Y relationship. The kind of thing you did earlier in this thread is one approach, but let's do it Simonsohn's way, which means using a linear spline and then identifying the nadir (in his article, he is looking at invert-U relationships, so he speaks of the peak). He does that by transforming X with restricted cubic splines. There is another wrinkle we have to overcome: the random effect at the country level will be problematic in later steps since it will end up giving undue weight to the countries with the lowest values of the intercept. So I think we need to first demean the data within country and then use regress. Something like this will do these steps:

Code:

by country_lvl, sort: egen Y_mean = mean(Y) gen Y_demeaned = Y - Y_mean mkspline X_spline = X, cubic regress Y_demeaned c.(X_spline*) the_usual_covariates predict Yhat, xb

Note that there are several simplifications to the original model. The quadratic terms are not used at all: the cubic spline will capture curvature, and far more flexibly. In addition, we use -regress- instead of -mixed-. This is because we are only interested in the shape of the relationship. The random intercepts and random slopes on gender don't have much effect on that, and using -regress- simplifies the later calculations. Once you have gotten this far, the minimum value of Yhat locates the Y value corresponding to any turning point. The next step is to identify the values of X that produce Yhat values within 1 standard deviation of that. The median value of that range of X's is Simonsohn's first approximation to a turning point. So,

Code:

summ Yhat local nadir = r(min) local ysd = r(sd) summ X if inrange(Yhat, `nadir'-`ysd', `nadir'+`ysd'), detail local x_left = r(min) local x_right = r(max) local x_middle = r(p50)

Now we are ready to go ahead and do left (to the left of `x_middle') and right (to the right of `x_middle') regressions:

Code:

regress Y_demeaned X the_usual_covariates if X < `x_middle' matrix M = r(table) local t_left = M["t", "X"] regress Y_demeaned X the_usual_covariates if X > `x_middle' local t_right = M["t", "X"]

Following each regression we have captured the t-statistic associated with the slope of X in these one-sided regressions.

Now we use these to calculate a "more powerful" estimate of the turning point:

Code:

local percentile = round(100*`t_right'/(`t_left'+`t_right')) centile X if inrange(Yhat, `nadir'-`ysd', `nadir'+`ysd'), centile(`percentile') local x_middle = r(c_1)

Now `x_middle' is the improved estimate of a turning point, and we now do regressions to the left and right of those.

Code:

regress Y_demeaned X the_usual_covariates if X < `x_middle' regress Y_demeaned X the_usual_covariates if X > `x_middle'

To conclude there is a U-shaped relationship you want the coefficients of X from these two regressions to be "statistically significant" and of opposite signs. If that is not what you find, then you will need to explore other transformations of X that capture some curvilinearity without reaching a turning point.

Hi, Clyde, Is there a test in Stata to check if the quadratic relationship is asymmetric? With u test we can check if the relationship is quadratic or monotonic but how to check for asymmetricity.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29955
#33

22 Aug 2023, 13:26

I'm not sure what you mean. If you fit a quadratic curve to the data, that curve is necessarily, by construction, symmetric.

There is always, however, the question as to whether this symmetry constraint does justice to the data. If I were facing that question, I would calculate the residuals from the model and then run a test comparing the residual distributions on each side of the turning point. To do that test comparison, I would not use something that just compares the means: I think one could easily have mean residual 0 on both sides but still have important differences in the residual distributions away from their means. So I'd probably do something that looks at the distributions more holistically, such as a -ranksum- test or perhaps Kolmogorov-Smirnov (-ksmirnov-). (Actually, not being a big fan of hypothesis tests, especially for judging the adequacy of models, I'd probably do something graphical like a -qqplot- and make a visual judgment.

I briefly experimented with these approaches. The code below generates some noisy data. y1 is built around a log curve (asymmetric) and y2 is built around an actual quadratic. I chose the vertex of the actual quadratic to be approximately in the same location as the vertex of the quadratic fit to the log data. If you run this you will see that while both the ranksum and Kolmogorov-Smirnov tests produce p-values that are lower for the log data than for the quadratic data, the results for the log data are not low enough to reject the null by traditional criteria. But the quantile-quantile plots make it obvious that the residual distributions from the truly quadratic data are essentially the same on both sides, whereas the residual distributions on both sides of the vertex in the log data are obviously different.

Code:

clear* set obs 100 set seed 12345 gen x = _n gen y1 = log(x) + rnormal(0, 0.25) regress y1 c.x##c.x local vertex = -_b[x]/(2*_b[c.x#c.x]) label define side 0 "Left" 1 "Right" gen int side:side = 0.5*(sign(x-`vertex')+1) if x != `vertex' predict resid, resid ranksum resid, by(side) ksmirnov resid, by(side) separate resid, by(side) qqplot resid0 resid1, name(log) gen y2 = 0.001*(x-80)^2 + rnormal(0, 1) regress y2 c.x##c.x local vertex = -_b[x]/(2*_b[c.x#c.x]) replace side = 0.5*(sign(x-`vertex')+1) if x != `vertex' drop resid* predict resid, resid ranksum resid, by(side) ksmirnov resid, by(side) separate resid, by(side) qqplot resid0 resid1, name(quadratic)

There may indeed be some named test that is specifically for testing symmetry, but I don't know of any such. Perhaps somebody following along does and will speak up.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#34

22 Aug 2023, 13:42

just noticed that and I note that a test for symmetry where the middle is known is clear but basically irrelevant; however, nonparametric stat tests often deal with this and looking at the index of each of the following will find at least one test:

Gibbon, JD (1985), Nonparametric Statistical Inference, second ed., revised and expanded, Marcel Dekker
Siegel, S and Castellan, NJ, Jr. (1988), Nonparametric statistics for the behavioral sciences, second edition, McGraw Hill
Hollander, M and Wolfe, DA (1999), Nonparametric statistical methods, second editioni, Wiley
Lehmann, EL (1975), Nonparametrics: statistical methods based on ranks, Holden-Day

As you can see, this is not part of my everyday work (or the cites would be more recent <grin>); I'm sure that more recent books also deal with this
1 like
Comment
Nitin Jain

Join Date: Apr 2022

Posts: 65
#35

24 Aug 2023, 01:51

Originally posted by Clyde Schechter View Post

I'm not sure what you mean. If you fit a quadratic curve to the data, that curve is necessarily, by construction, symmetric.

There is always, however, the question as to whether this symmetry constraint does justice to the data. If I were facing that question, I would calculate the residuals from the model and then run a test comparing the residual distributions on each side of the turning point. To do that test comparison, I would not use something that just compares the means: I think one could easily have mean residual 0 on both sides but still have important differences in the residual distributions away from their means. So I'd probably do something that looks at the distributions more holistically, such as a -ranksum- test or perhaps Kolmogorov-Smirnov (-ksmirnov-). (Actually, not being a big fan of hypothesis tests, especially for judging the adequacy of models, I'd probably do something graphical like a -qqplot- and make a visual judgment.

I briefly experimented with these approaches. The code below generates some noisy data. y1 is built around a log curve (asymmetric) and y2 is built around an actual quadratic. I chose the vertex of the actual quadratic to be approximately in the same location as the vertex of the quadratic fit to the log data. If you run this you will see that while both the ranksum and Kolmogorov-Smirnov tests produce p-values that are lower for the log data than for the quadratic data, the results for the log data are not low enough to reject the null by traditional criteria. But the quantile-quantile plots make it obvious that the residual distributions from the truly quadratic data are essentially the same on both sides, whereas the residual distributions on both sides of the vertex in the log data are obviously different.

Code:

clear* set obs 100 set seed 12345 gen x = _n gen y1 = log(x) + rnormal(0, 0.25) regress y1 c.x##c.x local vertex = -_b[x]/(2*_b[c.x#c.x]) label define side 0 "Left" 1 "Right" gen int side:side = 0.5*(sign(x-`vertex')+1) if x != `vertex' predict resid, resid ranksum resid, by(side) ksmirnov resid, by(side) separate resid, by(side) qqplot resid0 resid1, name(log) gen y2 = 0.001*(x-80)^2 + rnormal(0, 1) regress y2 c.x##c.x local vertex = -_b[x]/(2*_b[c.x#c.x]) replace side = 0.5*(sign(x-`vertex')+1) if x != `vertex' drop resid* predict resid, resid ranksum resid, by(side) ksmirnov resid, by(side) separate resid, by(side) qqplot resid0 resid1, name(quadratic)

There may indeed be some named test that is specifically for testing symmetry, but I don't know of any such. Perhaps somebody following along does and will speak up.

Thanks, Clyde. I will go through your suggestions. My model is quadratic in nature and the model free evidence shows an asymmetric, inverted u shaped relationship. There is a U test in stata that confirms this inverted U-shaped association but I need to find a way to check for asymmetircity. Hence, asked. Thanks again!
Comment
Nitin Jain

Join Date: Apr 2022

Posts: 65
#36

24 Aug 2023, 01:52

Originally posted by Rich Goldstein View Post

just noticed that and I note that a test for symmetry where the middle is known is clear but basically irrelevant; however, nonparametric stat tests often deal with this and looking at the index of each of the following will find at least one test:

Gibbon, JD (1985), Nonparametric Statistical Inference, second ed., revised and expanded, Marcel Dekker
Siegel, S and Castellan, NJ, Jr. (1988), Nonparametric statistics for the behavioral sciences, second edition, McGraw Hill
Hollander, M and Wolfe, DA (1999), Nonparametric statistical methods, second editioni, Wiley
Lehmann, EL (1975), Nonparametrics: statistical methods based on ranks, Holden-Day

As you can see, this is not part of my everyday work (or the cites would be more recent <grin>); I'm sure that more recent books also deal with this

Sure, thanks a lot, Rich. Will check these.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment