Testing coefficients from two different samples using xtreg

nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#1

Testing coefficients from two different samples using xtreg

04 Mar 2019, 09:12

Dear All,
I have seen several postings on the test of coefficients coming from two different xtreg models. I have the below model and would like to test whether the coefficient of fx in model 1 is smaller than the coefficient of fx in model 2.

Model 1:
xtreg expshare lfx dllabprod VIX Dolratetota col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001 & dummy_highimport==1, vce(robust)

Model 2:
xtreg expshare lfx dllabprod VIX Dolratetota col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001 & dummy_highimport==0, vce(robust)

I followed the recommendations on the postings and I obtained the following. As far as I understood, I should be interested in the p-value of c.lfx (0.976 ). This implies that the difference of the coefficients is not statistically significant. I would really appreciate if you could confirm whether I am on the right tract?

Have a great day!

gen group=1 if dummy_highimport==1
replace group=2 if dummy_highimport==0
tab group
xtreg expshare i.group##(c.lfx c.dllabprod c.VIX c.Dolratetota c.col c.leverage2 c.lFGDP_s c.ipsectoralgrowth c.log_GDP) if year>2001, fe robust

xtreg expshare i.group##(c.log_industry_rer_96_99_cst c.dllabprod c.VIX c.Dolratetota c.
> col c.leverage2 c.lFGDP_s c.ipsectoralgrowth c.log_GDP) if year>2001, fe robust
note: 2.group omitted because of collinearity

Fixed-effects (within) regression Number of obs = 28,430
Group variable: id Number of groups = 4,872

R-sq: Obs per group:
within = 0.0191 min = 1
between = 0.0029 avg = 5.8
overall = 0.0027 max = 9

F(18,4871) = 12.45
corr(u_i, Xb) = -0.9792 Prob > F = 0.0000

(Std. Err. adjusted for 4,872 clusters in id)
----------------------------------------------------------------------------------------------
| Robust
expshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
2.group | 0 (omitted)
lfx | -.0531907 .0362823 -1.47 0.143 -.1243204 .0179389
dllabprod | .0047292 .0035266 1.34 0.180 -.0021846 .011643
VIX | .0010271 .000228 4.51 0.000 .0005803 .001474
Dolratetotal | .0440244 .0068111 6.46 0.000 .0306716 .0573773
col | -.0466193 .0200281 -2.33 0.020 -.0858835 -.0073551
leverage2 | .0000302 .0002133 0.14 0.887 -.0003879 .0004482
lFGDP_s | -.2160109 .116787 -1.85 0.064 -.4449661 .0129443
ipsectoralgrowth | -.009061 .0087774 -1.03 0.302 -.0262686 .0081466
log_GDP | .1570772 .0616418 2.55 0.011 .0362314 .277923
|
group#|
c.lfx |
2 | .0013076 .0435208 0.03 0.976 -.0840129 .086628
|
group#c.dllabprod |
2 | .0072897 .0043555 1.67 0.094 -.0012491 .0158285
|
group#c.VIX |
2 | -.0011931 .000308 -3.87 0.000 -.0017969 -.0005894
|
group#c.Dolratetotal |
2 | .0019018 .0098426 0.19 0.847 -.0173941 .0211977
|
group#c.col |
2 | .0029947 .0271973 0.11 0.912 -.0503242 .0563137
|
group#c.leverage2 |
2 | .0117649 .0067593 1.74 0.082 -.0014864 .0250162
|
group#c.lFGDP_s |
2 | .1813865 .1796721 1.01 0.313 -.1708517 .5336248
|
group#c.ipsectoralgrowth |
2 | -.0188761 .0160302 -1.18 0.239 -.0503025 .0125503
|
group#c.log_GDP |
2 | -.2035728 .0829576 -2.45 0.014 -.366207 -.0409385
|
_cons | 3.030952 2.336514 1.30 0.195 -1.54967 7.611573
-----------------------------+----------------------------------------------------------------
sigma_u | 1.3683954
sigma_e | .1249707
rho | .99172847 (fraction of variance due to u_i)
----------------------------------------------------------------------------------------------
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#2

04 Mar 2019, 09:30

This looks correct to me.

I will take this opportunity to note that, in my opinion, significance testing is not the best approach in this situation. More informative, in my view, is to report the group difference in lfx coefficients (given by the group#c.lfx coefficient) and its 95% confidence interval or standard error.
1 like
Comment
nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#3

04 Mar 2019, 10:35

Originally posted by Clyde Schechter View Post

This looks correct to me.

I will take this opportunity to note that, in my opinion, significance testing is not the best approach in this situation. More informative, in my view, is to report the group difference in lfx coefficients (given by the group#c.lfx coefficient) and its 95% confidence interval or standard error.

Thank you Clyde. I understand that I have to report the parts in red.Why you think it's more appropriate?
Robust
expshare Coef. Std. Err. t P>t [95% Conf. Interval]

.0013076. .0435208 0.03 0.976 -.0840129 .086628
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#4

04 Mar 2019, 10:55

Well, if you do a significance test, you are testing the null hypothesis that there is zero difference between the two groups. But in most research contexts (and perhaps yours is an exception--there are some) that null hypothesis is a ridiculous straw-man. Of course, two different groups are going to differ on almost anything you measure to some degree. So if the null hypothesis is a straw man, why bother testing it? After all, you already know it's false. What you really want to know is something more like "is the difference large enough to matter?" Well, a p-value doesn't tell you that because the p-value is a mashup into a single statistic of the sample size, the noisiness of the data, and the true effect size. By contrast, a 95% confidence interval tells you, in this case, that your best estimate of the mean difference between groups is 0.0013076, and, given the vagaries of noise, and sample size, the proposition that the true value is somewhere between -0.0840129 and +0.086628 is plausible, because 95% of intervals calculated in this way actually do contain the correct value. So now you have an estimate of the group difference along with a sense of how precise that estimate is. If all of the values contained in the confidence interval are big enough to matter for practical purposes, then you can confidently assert that there is a meaningful group difference. If none of them are large enough to matter, then you can confidently assert that the difference between the groups is too small to matter. If the confidence interval spans differences that include both large enough to matter and too small to matter, then your conclusion must be tentative: my best estimate matters (or doesn't, as the case may be), but the data are compatible with the opposite being true.

The p-value gives you none of that; it just tells you whether something you already know to be false is incompatible with your data.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#5

04 Mar 2019, 11:13

An interesting and recent article on the relevant topic wisely highlighted by Clyde is: https://besjournals.onlinelibrary.wi...041-210X.13159

Kind regards,
Carlo
(Stata 19.0)
Comment
nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#6

04 Mar 2019, 13:54

I guess I see your point but my output implies that I should accept the null hypothesis that the difference of the coefficients are zero. So should I still worry about the confidence intervals when P is large?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#7

04 Mar 2019, 14:25

You cannot accept the null hypothesis unless you have first done a formal a priori power analysis showing that you have adequate data to support that conclusion. Without that, you can only reject the null hypothesis (if p is sufficiently small) or fail to reject the null (which is not the same as accepting the null--it is more like being agnostic about it.) So, your output does not, by itself imply that you must accept the null hypothesis. By itself, it just says you can't be confident the null hypothesis is false--but that doesn't even come close, by itself, to making it true.

And really, ask your self, if you had no data available, would you consider it at all reasonable that there might be no difference whatsoever between the two groups? If so, then testing that null hypothesis takes on a small bit of reasonableness. But if not (the more usable case), then testing the null hypothesis, though often done in practice because people don't think about it, makes no real sense--what you would really want to know is whether the difference matters. That leads you back to my line of reasoning in #4.
1 like
Comment
nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#8

05 Mar 2019, 07:07

Thank you very much for the explanation.
Given my results,
1) I fail to reject the null which implies that the difference of the coefficients can be small...
2) In a different set up (with confidence intervals containing large or small values), I can have more to say... but not in this situation.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#9

05 Mar 2019, 07:35

Nazlika:

Code:

expshare Coef. Std. Err. t P>t [95% Conf. Interval] .0013076. .0435208 0.03 0.976 -.0840129 .086628

1) as per your results, you fail to reject the null because the bounds of the 95%CI cross the 0 value. This means that you cannot rule out that the difference is 0 (ie, no difference). Please note that the high P-value is simply the other face of the coin of having a 95% CI that crosses 0. Your 95% CI tells you that, if you were to select, say, 100 random samples from the population your original sample was drawn from, 95 out of 100 95% CI intervals would contain the real, fixed and unknown value of the difference you were investigating (this is a frequentist notion; bayesians have different takes, which include the randomness of the population parameters).
Your large 95% CI may depend on a too small sample size, an actual absence of difference in the population from which your sample has been drawn, the role of other predictors or else.
2) I fail to get your second statement.

Kind regards,
Carlo
(Stata 19.0)
Comment
nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#10

05 Mar 2019, 12:22

I want to ask one more thing if possible:
Whats the difference between a &b?

A)
gen group=1 if dummy_highimport==1
replace group=2 if dummy_highimport==0
tab group
xtreg expshare i.group##(c.lfx c.dllabprod c.VIX c.Dolratetota c.col c.leverage2 c.lFGDP_s c.ipsectoralgrowth c.log_GDP) if year>2001, fe robust
look the coefficient (p value, confidence interval etc.) of c.lfx to see whether to see the impact of lfx is higher in high import than low import?

B)
gen IO_lfx=dummy_highimport*lfx
xtreg expshare lfx dllabprod IO_lfx VIX Dolratetota lrsale col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001,fe robust
. lincom IO_lfx

( 1) IO_lfx = 0

------------------------------------------------------------------------------
expshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .0858202 .0161961 5.30 0.000 .0540749 .1175656
------------------------------------------------------------------------
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#11

05 Mar 2019, 15:05

A) is a valid model incorrectly interpreted. To see if the effect of lfx differs between high and low import you have to look at the coefficient of group#lfx, not at the coefficient of lfx.

B) is probably an invalid model. I say "probably" because I do not know exactly how the variable dummy_highimport is constructed. If dummy_highimport is constant within id, then the model is OK. But if dummy_highimport can vary over time within an id, then the model is invalid because it has been omitted. Assuming that dummy_highimport is, in fact, a time-invariant attribute of each id, so that the model is valid, then the results of lincom IO_lfx will match the findings from group#lfx in model A.
Comment
nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#12

06 Mar 2019, 05:01

Dummy is constant within ID. I guess I had a mistake in the previous model as I have interacted all the independent variables. In this version, I got the same results from both models.

xtreg expshare i.group##(c.log_industry_rer_96_99_cst) dllabprod VIX Dolratetota col leverage2
> lFGDP_s ipsectoralgrowth log_GDP if year>2001

Random-effects GLS regression Number of obs = 28,430
Group variable: id Number of groups = 4,872

R-sq: Obs per group:
within = 0.0155 min = 1
between = 0.1339 avg = 5.8
overall = 0.1079 max = 9

Wald chi2(11) = 925.27
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

----------------------------------------------------------------------------------------------
expshare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
2.group | .5864936 .0745983 7.86 0.000 .4402835 .7327036
log_industry_rer_96_99_cst | .0441343 .0223026 1.98 0.048 .0004219 .0878466
|
group#|
c.log_industry_rer_96_99_cst |
2 | -.1267491 .016303 -7.77 0.000 -.1587024 -.0947958
|
dllabprod | .0089485 .0014072 6.36 0.000 .0061904 .0117065
VIX | .0002622 .0001179 2.22 0.026 .0000312 .0004932
Dolratetotal | .0766487 .0032862 23.32 0.000 .0702078 .0830896
col | -.0517889 .0078533 -6.59 0.000 -.0671811 -.0363967
leverage2 | .000576 .00067 0.86 0.390 -.0007372 .0018891
lFGDP_s | .1066267 .0097353 10.95 0.000 .0875459 .1257076
ipsectoralgrowth | -.0296929 .0071745 -4.14 0.000 -.0437547 -.0156312
log_GDP | -.0600679 .0173869 -3.45 0.001 -.0941457 -.0259901
_cons | -2.34548 .2685857 -8.73 0.000 -2.871898 -1.819062
-----------------------------+----------------------------------------------------------------
sigma_u | .24569202
sigma_e | .12510531
rho | .79410442 (fraction of variance due to u_i)
----------------------------------------------------------------------------------------------

.
end of do-file

. do "/var/folders/49/5dv72bd538q8cshsq1_vf2n40000gp/T//SD00263.000000"

. xtreg expshare log_industry_rer_96_99_cst IO_RER_PS dummy_highimport dllabprod VIX Dolratetota
> col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001

Random-effects GLS regression Number of obs = 28,430
Group variable: id Number of groups = 4,872

R-sq: Obs per group:
within = 0.0155 min = 1
between = 0.1339 avg = 5.8
overall = 0.1079 max = 9

Wald chi2(11) = 925.27
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

--------------------------------------------------------------------------------------------
expshare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
log_industry_rer_96_99_cst | -.0826149 .0201291 -4.10 0.000 -.1220672 -.0431625
IO_RER_PS | .1267491 .016303 7.77 0.000 .0947958 .1587024
dummy_highimport | -.5864936 .0745983 -7.86 0.000 -.7327036 -.4402835
dllabprod | .0089485 .0014072 6.36 0.000 .0061904 .0117065
VIX | .0002622 .0001179 2.22 0.026 .0000312 .0004932
Dolratetotal | .0766487 .0032862 23.32 0.000 .0702078 .0830896
col | -.0517889 .0078533 -6.59 0.000 -.0671811 -.0363967
leverage2 | .000576 .00067 0.86 0.390 -.0007372 .0018891
lFGDP_s | .1066267 .0097353 10.95 0.000 .0875459 .1257076
ipsectoralgrowth | -.0296929 .0071745 -4.14 0.000 -.0437547 -.0156312
log_GDP | -.0600679 .0173869 -3.45 0.001 -.0941457 -.0259901
_cons | -1.758987 .2681726 -6.56 0.000 -2.284595 -1.233378
---------------------------+----------------------------------------------------------------
sigma_u | .24569202
sigma_e | .12510531
rho | .79410442 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------
Comment
nazlikaramollaoglu

Join Date: Oct 2014

Posts: 95
#13

07 Mar 2019, 14:05

I have one another question. For another specification, I run a difference GMM regression instead on xtreg. I divide the sample into three categories based on firm characteristics which are constant over ids. I want to test whether the coefficients of fx based on sample 1&sample2, sample 2&sample3 and sample 1&sample3 are statistically different from each other.

I assume that
1) GMM asymptotically distributed as normal distribution, and
2) the three samples are independent.
Then I calculate the t values as
(c1-c2)/sqrt(s1^2+s2^2).
where c1 and c2 are the coefficients of fx, while s1 and s2 are standard deviations of the fx
Here I compare t-stat with 1.96 and decide whether the coefficients are statistically different from each other.
I would appreciate if you could share your thoughts on this approach?
Thanks again for your time,
Nazlı
Comment
Ashani Abayasekara

Join Date: May 2023

Posts: 106
#14

28 Feb 2024, 00:28

I have a related question about testing whether coefficients of two regression models are statistically different. I'm looking at the effects of unemployment (and related characteristics) on perceived job security, and whether unemployment effects differ by mental health status. I first tried estimating two separate models for those in poor and good mental health, and then using the suest command to test for differences in the unemployment coefficient. However, I'm using fixed effects regressions and I got an error message saying that "xtreg is not supported by suest". My code is as follows:

Code:

xtreg jobsec unemp male1 age tenure contract fire1 hire1 i.j1 if mh9_q1==0, fe i(id) cluster(id) est store e4 xtreg jobsec unemp male1 age tenure contract fire1 hire1 i.j1 if mh9_q1==1, fe i(id) cluster(id) est store e5 suest e4 e5

Is there an alternative command that I can use in this case to test whether the 'unemp' coefficients are statistically different in the two models? Or is the feasible alternative to use an interaction model as follows, and look at the coefficient for mh9_q1##unemp?

Code:

xtreg jobsec mh9_q1##( c.unemp male1 c.age c.tenure contract c.fire1 c.hire1) i.j1, cluster(id)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#15

28 Feb 2024, 00:45

Ashani:
your last code is the way to go.
Then you can check what you're after via -test- and -lincom-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Testing coefficients from two different samples using xtreg

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment