Test the statistical significance of the coefficient differences using permutation

Jae Li

Join Date: May 2017

Posts: 184
#1

Test the statistical significance of the coefficient differences using permutation

15 Jun 2024, 12:19

Hi everyone, I have a query about testing whether the coefficient difference is statistically significant from zero using permutation test. Here is my codes:

Code:

program define permute, rclass reg Y X controls i.year i.sic2 if cf_indicator, vce(robust) return scalar small = _b[X] reg Y X controls i.year i.sic2 if !cf_indicator, vce(robust) return scalar d = _b[X] - small end permute cf_indicator d = r(d), strata(row) reps(3000) nodots seed(69): permute

However, the result only gives two regression results without comparing their coefficients and test the statistical significance of the mean difference. Any suggestions will be appreciated!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

16 Jun 2024, 01:39

Jae:
why not considering FAQ: Chow tests | Stata , instead?

Kind regards,
Carlo
(Stata 19.0)
Comment
Jae Li

Join Date: May 2017

Posts: 184
#3

16 Jun 2024, 02:50

@Carlo Lazzaro Hi Carlo, thank you for your advice! I tried Chow tests but the coefficient difference is not significant, so I wanna try the permutation test again. I tried the above codes and they only gives two regression results. Do you know how to test the significance of the mean difference after the regression results? Many thanks in advance!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

16 Jun 2024, 04:34

Jae:
another approach is the folowing one:

Code:

. sysuse auto.dta
(1978 automobile data)

. regress price mpg if foreign==0

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     17.05
       Model |   124392956         1   124392956   Prob > F        =    0.0001
    Residual |   364801844        50  7296036.89   R-squared       =    0.2543
-------------+----------------------------------   Adj R-squared   =    0.2394
       Total |   489194801        51  9592054.92   Root MSE        =    2701.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
       _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
------------------------------------------------------------------------------

. estimates store A

. regress price mpg if foreign==1

      Source |       SS           df       MS      Number of obs   =        22
-------------+----------------------------------   F(1, 20)        =     13.25
       Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
    Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
-------------+----------------------------------   Adj R-squared   =    0.3685
       Total |   144363213        21   6874438.7   Root MSE        =    2083.6

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
       _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
------------------------------------------------------------------------------

. estimates store B

. suest A B

Simultaneous results for A, B                               Number of obs = 74

------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
A_mean       |
         mpg |  -329.2551   80.16093    -4.11   0.000    -486.3676   -172.1425
       _cons |   12600.54   1755.108     7.18   0.000     9160.589    16040.49
-------------+----------------------------------------------------------------
A_lnvar      |
       _cons |   15.80284   .2986031    52.92   0.000     15.21759    16.38809
-------------+----------------------------------------------------------------
B_mean       |
         mpg |  -250.3668   84.69387    -2.96   0.003    -416.3637   -84.36987
       _cons |   12586.95   2258.417     5.57   0.000     8160.534    17013.37
-------------+----------------------------------------------------------------
B_lnvar      |
       _cons |   15.28371   .2310235    66.16   0.000     14.83091    15.73651
------------------------------------------------------------------------------

. help lincom

. lincom [A_mean]mpg + [B_mean]mpg

 ( 1)  [A_mean]mpg + [B_mean]mpg = 0

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -579.6219    116.614    -4.97   0.000    -808.1811   -351.0626
------------------------------------------------------------------------------

.

That said, results are what they are.

Kind regards,
Carlo
(Stata 19.0)

Comment

Jae Li

Join Date: May 2017

Posts: 184
#5

16 Jun 2024, 12:38

@Carlo Lazzaro Hi Carlo, thank you for your advice! I also tried this approach but the coefficient difference was not significant. Any ideas about using permutation method? permutation test seems to be easier to produce significant results. Many thanks to you!
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

17 Jun 2024, 02:35

Originally posted by Jae Li View Post

Any ideas about using permutation method?

You can try out one or the other suggestion below. Begin at the "Begin here" comment; the stuff above that is creating a phony dataset for use in illustrating the methods. I've created it with the same variable names as you show above in #1 for your convenience,

Code:

version 18.0

clear *

// seedem
set seed 1668111051

quietly set obs 250

generate double Y = rnormal()

foreach var of newlist X controls {
    generate double `var' = runiform()
}

foreach var of newlist year sic2 {
    generate byte `var' = rbinomial(4, 0.5)
}

generate byte cf_indicator = mod(_n, 2)

*
* Begin here
*
// Permutation Method 1

/* cf_indicator must be rightmost variable and not as a factor variable */

program define chow, rclass
    version 18.0
    syntax varlist(numeric fv)

    local group : word `:word count `varlist'' of `varlist'

    regress `varlist' if `group'
    tempname small
    scalar define `small' = _b[X]

    regress `varlist' if !`group'
    return scalar T = _b[X] - `small'
end

permute cf_indicator T = r(T), nodrop ///
    reps(3000) nodots: chow Y c.(X controls) i.(year sic2) cf_indicator

// Permutation Method 2
permute cf_indicator T = _b[1.cf_indicator#X], reps(3000) nodots: ///
    regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2)

// Nonpermutation Method 1
regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2)
test 1.cf_indicator#X

// Nonpermutation Method 2
regress Y c.(X controls) i.(year sic2) if cf_indicator
estimates store Avec

regress Y c.(X controls) i.(year sic2) if !cf_indicator
estimates store Sans

suest Avec Sans
lincom [Avec_mean]X - [Sans_mean]X

exit

Complete do-file of the above and its log file are attached if you're interested further.

Attached Files

Comment

Jae Li

Join Date: May 2017
Posts: 184

17 Jun 2024, 14:05

@Joseph Coveney Hi Joseph, thank you so much for your suggestions! I have a few queries about your codes and further explanations will be greatly appreciated.

1. When running the Permutation Method 1,

what does syntax varlist(numeric fv) mean?

2. What is

Code:

local macro `group'

referring to? the number of variables in the varlist in the dataset? Can `group' equal to cf_indicator?

3. When I run the below code, it shows an error. I tried to install it using -help chow-, but it seems have many options, which one shall I install? Also, shall I set the seed() or strata()?

Code:

 permute cf_indicator T = r(T), nodrop ///  reps(3000) nodots: chow Y c.(X controls) i.(year sic2) cf_indicator

chow command not found
r(111);

4. When running the Permutation method 2, an error occurs:

Code:

 permute cf_indicator T = _b[1.cf_indicator#X], reps(3000) nodots: ///     regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2)

[1.cf_indicator#p_inv_cash] not found
error in expression: _b[1.cf_indicator#p_inv_cash]
r(111);

5. When running the Nonpermutation method 1, it shows an error.

Code:

 regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2) test 1.cf_indicator#X

varlist not allowed
r(101);

However, those above errors aren't in your example data so I've attached my data here for more information.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(Y X control1 control2 control3 cf_indicator)
 .10857251   .027367014  .1617778    .16916423  .28899306 1
.004715193    .00342154 .07875443    .04344569          0 0
 .07356971    .00768116 .21203567    .03622579  1.5344946 1
 .24486415    .02626503  .3915742    .06707548  .03948495 0
  .0974177   .009679618 .06010582    .16104294  .25834543 1
 .12402222   .002993282 .24261387   .012337638   .5338913 0
   .638522   .002508853 .26249257   .009557805  .09159508 1
 .17772473  .0005537187 .24296713  .0022789862  1.0631486 0
.065786734   .007629359  .1522691    .05010445  1.3298148 1
  .4887764   .007368368  .5235301    .01407439  .19877225 0
 .07618244   .006024898 .14484248    .04159621  .02062019 1
  .2716101   .012333836  .1950324    .06323993  1.0343405 0
 .24213625   .033409104  .4756752    .07023512   .8017449 0
 .05152088   .008231385  .3285325    .02505501  1.6265423 0
  .3198293    .07551782 .26575565     .2841626 .018619418 0
.005872167  .0020411615 .23348087   .008742307  .28517148 1
 .10852266    .08625486  .4487196     .1922244   .4391095 0
  .1723519   .015215768  .4720508    .03223333   .3995879 1
 .17524564    .07167494 .29243731    .24509507          0 0
   .334016    .08365775 .47080365    .17769137          0 1
 .26722816    .12690866  .2339996    .54234564          0 0
   .313495    .03201833  .4261253     .0751383   .4899221 0
.004715193 .00009424942  .2056838 .00024708087  .29386455 1
.023968374 .00021614583  .1628507  .0013272637   .8164231 1
   .313495    .03201833  .4261253     .0751383   .5217251 0
  .8689613   .002538206 .20308894   .012498003  1.1689584 1
 .12837756   .069280066  .4425719     .1565397  .41106445 1
 .50597847     .0943788  .5852041      .161275  .12781169 1
  .3441456   .036102075  .4487038    .08045858   .3804414 0
 .05049255   .017573243  .3590992    .04893701  .08743084 1
 .09301658   .002797694 .24960037   .011208693   .6111767 1
 .08937198    .04295685  .4445101    .09663865  1.0428007 0
  .1787726  .0016331812  .5226063    .00312507    .745356 0
  .4063446    .03537859 .58808196    .06015928  1.3759187 1
  1.979627     .1165586  .4878553    .23892044   .4857574 0
 .17712164    .02500915  .4474563    .05589182   .5667625 0
 1.4503298     .0477997  .6210141     .0769704   .6133192 1
  .3251538    .10031226  .7133143    .14062841   .7327549 0
  .4702225   .011859408   .140448    .08443984  .14947967 0
  .2132593    .23411173  .5800263     .4036226          0 1
   .248027     .1949663 .58043396     .3358975          0 1
 .07627075   .002505514  .2193786    .01142096          . 0
 .29370776  .0018260905 .53876173  .0033894214  .50912523 0
  2.079191     .0208887 .58241403   .035865724  1.6807666 1
 .21136297   .003736378  .5031057   .007426626   .6456264 1
 .13161702    .03820193 .57151175   .066843644  .00481922 1
  .4813593    .07807865 .53771377    .14520486 .007829719 0
 1.5364577   .008238988  .5049697   .016315807  1.5893137 1
 .03813507    .04468339  .6269451   .071271606  .22430553 1
 .11030934  .0024246236  .3847911   .006301142   .8458499 1
  .5206426     .5801843 .58758914     .9873978  .12108879 1
  .4771008    .07052253  .6096699     .1156733   .7563603 0
  .3057053   .016236434  .4456767   .036430966  .53288007 0
  2.079191     .4596539  .4968177     .9251963          0 1
  .5774904   .008867126  .5754652   .015408622   .2238851 0
  .2095313   .008551225  .5826758   .014675785   .9191899 0
  .2487967   .001497561 .53969705  .0027748176   .7886181 0
 .12231066    .02759775  .5842511    .04723611   .1880312 0
 .55900586   .005550019  .5972396   .009292785  1.7633833 1
  .3861424   .026005374  .4499177    .05780029   .3752838 0
  2.079191    .03263767  .5278719    .06182877   1.473944 0
  2.079191    .13133055 .58526325     .2243957   1.269677 1
   .753144    .08223213  .5790282    .14201748  .50722235 0
.036778104   .027954325  .4816376    .05804016   1.424169 1
 .17691883    .02595752  .4611311    .05629098   .7341017 0
  .4311565    .06300014 .51378274     .1226202  1.2756475 1
  .3100247    .04491757  .5508267    .08154574          0 0
  .1855146  .0043680207 .55826914    .00782422  1.2376318 1
 .11989193    .09456532   .370757    .25506008          0 1
 .19205087     .2068062  .5116492     .4041953   .6118686 1
  .4650525    .18868466  .5044149     .3740664          0 0
  .6884933   .012823677 .51965696   .024677197  .53486913 0
 .37108645  .0039139963   .558886   .007003211   .7731268 0
 1.0948026    .20888075  .5320679     .3925829  1.0697545 1
 .30136025   .016102951 .53142136    .03030166  .23610672 1
  2.079191    .04692367  .5961493    .07871129   1.328614 1
  .5839605    .02132849  .5291139    .04030983   .6367181 0
 .25997332    .04220337  .5615239    .07515863   .9653184 0
 .02022237    .01206726  .6421605    .01879166  1.0595987 1
  .3753801  .0008278707 .54682714  .0015139533  .06695514 0
 1.9198842    .06785603  .5546914    .12233113   .1781987 1
   .448255    .06595232  .6572369    .10034786  .25942507 1
  .2431905  .0039614234  .5336711   .007422968  1.2180876 1
 .51940185    .01979476  .6040931    .03276773   .7177066 1
  1.291627  .0001211069  .4901509 .00024708087    .688301 0
  .4500455   .002560475  .5207855   .004916564    .241203 0
 .08850165      .160966 .58132917    .27689305          0 0
  1.071426  .0002020549  .5816807  .0003473639   1.478687 1
  .7582617  .0020804815 .29001713    .00717365  1.0956359 1
 .29327095  .0022241888   .595995   .003731892   .4437316 0
.004715193    .11037994  .4840363    .22804064          0 1
   .205585   .006279338 .50984406   .012316193   .6374559 0
  .1014176  .0011013633 .55327094    .00199064   .4670753 1
  .6393175   .003821763 .58930624   .006485191   .6376336 0
  .7717947      .238334 .59041506    .40367195  1.1068755 1
  .2156449    .18884975 .52937466     .3567412          0 1
  .0303877    .29199174 .59272385      .492627          0 1
 1.1754621    .23910648 .55291265     .4324489  .25955012 1
 .52287555    .09570356  .5977402    .16010897  .12721778 1
  .2822584     .1243819  .4683426     .2655789          0 1
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 49576 observations

Thank you so much for your enlightenments!

Last edited by Jae Li; 17 Jun 2024, 14:08.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#8

17 Jun 2024, 18:27

Originally posted by Jae Li View Post

1. what does syntax varlist(numeric fv) mean?

Refer to the help file.

2. What is local macro `group' referring to? the number of variables in the varlist in the dataset? Can `group' equal to cf_indicator?

Yes, the local macro holds the name of the cf_indicator variable.

I notice that I neglected to remove that variable from the pass-through variable list fed to regress inside the program that I show above. As it happens, it doesn't matter because cf_indicator ends up collinear with the intercept (_cons), so that regress automatically drops it and it doesn't affect the results. But it's better programming practice to explicitly remove it as in the following.

Code:

program define chow, rclass version 18.0 syntax varlist(numeric fv) local group : word `:word count `varlist'' of `varlist' local varlist : list varlist - group regress `varlist' if `group' tempname small scalar define `small' = _b[X] regress `varlist' if !`group' return scalar T = _b[X] - `small' end

3. When I run the below code, it shows an error. I tried to install it using -help chow-, but it seems have many options, which one shall I install? Also, shall I set the seed() or strata()?

I'm not sure what you're doing here. I guess that there's also a user-written command chow? The program used above is created in the do-file.

4. When running the Permutation method 2, an error occurs:

It seems that you were using the variable name controls to refer to more than a single variable. Refer to them individually.

5. When running the Nonpermutation method 1, it shows an error.

However, those above errors aren't in your example data so I've attached my data here for more information.

I think that the answer is the same as that for your fourth question immediately above.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#9

17 Jun 2024, 20:04

Originally posted by Joseph Coveney View Post

. . . it's better programming practice to explicitly remove it as in the following.

More straightforward, though, is to include it as an option.

Code:

program define chow, rclass version 18.0 syntax varlist(numeric fv), group(varname numeric) regress `varlist' if `group' tempname small scalar define `small' = _b[X] regress `varlist' if !`group' return scalar T = _b[X] - `small' end permute cf_indicator T = r(T), nodrop /// reps(100) nodots: chow Y c.(X controls) i.(year sic2), group(cf_indicator)

I had tried this approach in one of the interim candidate setups, had trouble with it for a reason that I didn't understand at the time, and so prematurely abandoned it
Comment

Jae Li

Join Date: May 2017
Posts: 184

#10

20 Jun 2024, 03:58

@Joseph Coveney Hi Joseph, thank you so much for getting back to me!

When running the codes using the Permutation methods 1, 2 and in post #9, they all generate the same result at below. Do you possibly know how to interpret the result? It seems to be an error. Any ideas to fix it?

Code:

. permute cf_indicator T = r(T), nodrop /// 
     reps(3000) nodots: chow Y c.(X control1 control2 control3) i.(year sic2) cf_indicator

Monte Carlo permutation results                 Number of observations = 49,576
Permutation variable: cf_indicator              Number of permutations =  3,000

      Command: chow Y c.(X control1 control2 control3)  i.(year sic2) cf_indicator
            T: r(T)

-------------------------------------------------------------------------------
             |                                               Monte Carlo error
             |                                              -------------------
           T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
-------------+-----------------------------------------------------------------
           T | -.0419947      lower       0       0      .      .      .      .
             |                upper       0       0      .      .      .      .
             |            two-sided                      .      .      .      .
-------------------------------------------------------------------------------
Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
       For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
       For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.
       Some permutations led to results with missing values.

. 
end of do-file

When running the Non-permutation method 1, the result shows an error:

Code:

. test 1.cf_indicator#X
varlist not allowed
r(101);

When running the Non-permutation method 2, the results also shows an error:

Code:

. lincom [Avec_mean]X - [Sans_mean]X
weights not allowed
r(101);

I've attached the sample data in #7 in case you wanna test the data. Thank you so much for your help!

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#11

20 Jun 2024, 06:20

Originally posted by Jae Li View Post

I've attached the sample data in #7 in case you wanna test the data. Thank you so much for your help!

Please attach the entire dataset as a .dta file (Stata dataset file).
1 like
Comment
Jae Li

Join Date: May 2017

Posts: 184
#12

20 Jun 2024, 13:26

@Joseph Coveney Hi Joseph, sure, please see it attached here for your review: permutation_data.dta Many thanks to you! I look forward to hearing from you!
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

#13

20 Jun 2024, 20:04

You have empty cells in the cross-classified categories involving cf_indicator and sic3. (Visible in the regression results table with Nonpermutation Method 1.)

It will be better to permute the outcome variable, Y, instead in order to avoid the solid row of red xs that you get with permute when trying to permute cf_indicator. I show how to do this below. (I rename the variables for brevity.) Complete do-file and its log file are attached.

Code:

version 18.0

clear *

// seedem
set seed 672708676

use permutation_data

rename control? co?
rename cf* cfi
foreach var of varlist _all {
    if `=strlen("`var'")' > 3 {
        local new = substr("`var'", 1, 3)
        rename `var' `new'
    }
}
rename Y out
rename X pre

*
* Begin here
*

// Permutation Method 1
program define chow, rclass
    version 18.0
    syntax varlist(numeric fv), group(varname numeric)

    regress `varlist' if `group'
    tempname small
    scalar define `small' = _b[pre]

    regress `varlist' if !`group'
    return scalar T = _b[pre] - `small'
end

permute out T = r(T), nodrop reps(100) nodots: ///
    chow out c.(pre co?) i.(yea sic), group(cfi)

// Permutation Method 2
permute out T = _b[1.cfi#pre], reps(100) nodots: ///
    regress out i.cfi##c.(pre co?) i.cfi##i.(yea sic)

// Nonpermutation Method 1
regress out i.cfi##c.(pre co?) i.cfi##i.(yea sic)
test 1.cfi#pre

// Nonpermutation Method 2
regress out c.(pre co?) i.(yea sic) if cfi
estimates store Avec

regress out c.(pre co?) i.(yea sic) if !cfi
estimates store Sans

suest Avec Sans
lincom [Avec_mean]pre - [Sans_mean]pre

exit

I limit the number of Monte Carlo permutations to 100—the P value is so large that it doesn't warrant anything more (Permutation Method 1, P = 0.7; Permutation Method 2, P = 0.4; Nonpermutation Method 1, P = 0.5; Nonpermutation Method 2, P = 0.7).

Attached Files

Comment

Jae Li

Join Date: May 2017
Posts: 184

#14

22 Jun 2024, 04:59

@Joseph Coveney Hi Joseph, I extremely appreciate your big help! It works perfectly fine now. Based on the obtained P values, I tried to calculate the t-statistics of the mean different T for documentation purpose and added the red-color codes, but there is an error. Do you possibly know how to fix it? Many thanks to you for your big help!

Code:

 // Permutation Method 1
program define chow, rclass    
version 18.0     syntax varlist(numeric fv), group(varname numeric)      
regress `varlist' if `group'    
tempname small    
scalar define `small' = _b[pre]    
scalar define `small_se' = _se[pre]      
regress `varlist' if !`group'    
return scalar T = _b[pre] - `small'    
return scalar T_se = _se[pre] - `small_se'
end  
 
permute out T = T/T_se, nodrop reps(100) nodots: ///     chow out c.(pre co?) i.(yea sic), group(cfi)  

invalid syntax
an error occurred when permute executed chow
r(198);

Last edited by Jae Li; 22 Jun 2024, 05:01.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#15

22 Jun 2024, 20:16

Originally posted by Jae Li View Post

I tried to calculate the t-statistics of the mean different T for documentation purpose and added the red-color codes, but there is an error. Do you possibly know how to fix it?

Subtracting regression coefficient standard errors as you do doesn't make sense to me. The modified expression list that you feed to permute seems to have a syntax error, too.

If you want the Wald test statistic for the difference between cf indicator groups of the X slope, then you can get it from one of the two nonpermutation methods that I show above in #13. Otherwise, you can bootstrap the difference (T) for an asymptotic standard error or confidence interval.
Comment

Announcement