Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test the statistical significance of the coefficient differences using permutation

    Hi everyone, I have a query about testing whether the coefficient difference is statistically significant from zero using permutation test. Here is my codes:


    Code:
    program define permute, rclass
    
        reg Y X controls i.year i.sic2 if cf_indicator, vce(robust)
        
        return scalar small = _b[X]
    
        reg Y X controls i.year i.sic2 if !cf_indicator, vce(robust)
        
        return scalar d = _b[X] - small
    end
    
    permute cf_indicator d = r(d),  strata(row) reps(3000) nodots  seed(69): permute
    However, the result only gives two regression results without comparing their coefficients and test the statistical significance of the mean difference. Any suggestions will be appreciated!

  • #2
    Jae:
    why not considering FAQ: Chow tests | Stata , instead?
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      @Carlo Lazzaro Hi Carlo, thank you for your advice! I tried Chow tests but the coefficient difference is not significant, so I wanna try the permutation test again. I tried the above codes and they only gives two regression results. Do you know how to test the significance of the mean difference after the regression results? Many thanks in advance!

      Comment


      • #4
        Jae:
        another approach is the folowing one:
        Code:
        . sysuse auto.dta
        (1978 automobile data)
        
        . regress price mpg if foreign==0
        
              Source |       SS           df       MS      Number of obs   =        52
        -------------+----------------------------------   F(1, 50)        =     17.05
               Model |   124392956         1   124392956   Prob > F        =    0.0001
            Residual |   364801844        50  7296036.89   R-squared       =    0.2543
        -------------+----------------------------------   Adj R-squared   =    0.2394
               Total |   489194801        51  9592054.92   Root MSE        =    2701.1
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
               _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
        ------------------------------------------------------------------------------
        
        . estimates store A
        
        . regress price mpg if foreign==1
        
              Source |       SS           df       MS      Number of obs   =        22
        -------------+----------------------------------   F(1, 20)        =     13.25
               Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
            Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
        -------------+----------------------------------   Adj R-squared   =    0.3685
               Total |   144363213        21   6874438.7   Root MSE        =    2083.6
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
               _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
        ------------------------------------------------------------------------------
        
        . estimates store B
        
        . suest A B
        
        Simultaneous results for A, B                               Number of obs = 74
        
        ------------------------------------------------------------------------------
                     |               Robust
                     | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
        A_mean       |
                 mpg |  -329.2551   80.16093    -4.11   0.000    -486.3676   -172.1425
               _cons |   12600.54   1755.108     7.18   0.000     9160.589    16040.49
        -------------+----------------------------------------------------------------
        A_lnvar      |
               _cons |   15.80284   .2986031    52.92   0.000     15.21759    16.38809
        -------------+----------------------------------------------------------------
        B_mean       |
                 mpg |  -250.3668   84.69387    -2.96   0.003    -416.3637   -84.36987
               _cons |   12586.95   2258.417     5.57   0.000     8160.534    17013.37
        -------------+----------------------------------------------------------------
        B_lnvar      |
               _cons |   15.28371   .2310235    66.16   0.000     14.83091    15.73651
        ------------------------------------------------------------------------------
        
        . help lincom
        
        . lincom [A_mean]mpg + [B_mean]mpg
        
         ( 1)  [A_mean]mpg + [B_mean]mpg = 0
        
        ------------------------------------------------------------------------------
                     | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 (1) |  -579.6219    116.614    -4.97   0.000    -808.1811   -351.0626
        ------------------------------------------------------------------------------
        
        .
        That said, results are what they are.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          @Carlo Lazzaro Hi Carlo, thank you for your advice! I also tried this approach but the coefficient difference was not significant. Any ideas about using permutation method? permutation test seems to be easier to produce significant results. Many thanks to you!

          Comment


          • #6
            Originally posted by Jae Li View Post
            Any ideas about using permutation method?
            You can try out one or the other suggestion below. Begin at the "Begin here" comment; the stuff above that is creating a phony dataset for use in illustrating the methods. I've created it with the same variable names as you show above in #1 for your convenience,
            Code:
            version 18.0
            
            clear *
            
            // seedem
            set seed 1668111051
            
            quietly set obs 250
            
            generate double Y = rnormal()
            
            foreach var of newlist X controls {
                generate double `var' = runiform()
            }
            
            foreach var of newlist year sic2 {
                generate byte `var' = rbinomial(4, 0.5)
            }
            
            generate byte cf_indicator = mod(_n, 2)
            
            *
            * Begin here
            *
            // Permutation Method 1
            
            /* cf_indicator must be rightmost variable and not as a factor variable */
            
            program define chow, rclass
                version 18.0
                syntax varlist(numeric fv)
            
                local group : word `:word count `varlist'' of `varlist'
            
                regress `varlist' if `group'
                tempname small
                scalar define `small' = _b[X]
            
                regress `varlist' if !`group'
                return scalar T = _b[X] - `small'
            end
            
            permute cf_indicator T = r(T), nodrop ///
                reps(3000) nodots: chow Y c.(X controls) i.(year sic2) cf_indicator
            
            // Permutation Method 2
            permute cf_indicator T = _b[1.cf_indicator#X], reps(3000) nodots: ///
                regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2)
            
            // Nonpermutation Method 1
            regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2)
            test 1.cf_indicator#X
            
            // Nonpermutation Method 2
            regress Y c.(X controls) i.(year sic2) if cf_indicator
            estimates store Avec
            
            regress Y c.(X controls) i.(year sic2) if !cf_indicator
            estimates store Sans
            
            suest Avec Sans
            lincom [Avec_mean]X - [Sans_mean]X
            
            exit
            Complete do-file of the above and its log file are attached if you're interested further.
            Attached Files

            Comment


            • #7
              @Joseph Coveney Hi Joseph, thank you so much for your suggestions! I have a few queries about your codes and further explanations will be greatly appreciated.

              1. When running the Permutation Method 1,
              what does syntax varlist(numeric fv) mean?
              2. What is
              Code:
              local macro `group'
              referring to? the number of variables in the varlist in the dataset? Can `group' equal to cf_indicator?

              3. When I run the below code, it shows an error. I tried to install it using -help chow-, but it seems have many options, which one shall I install? Also, shall I set the seed() or strata()?

              Code:
               permute cf_indicator T = r(T), nodrop ///  reps(3000) nodots: chow Y c.(X controls) i.(year sic2) cf_indicator
              
              chow command not found
              r(111);
              4. When running the Permutation method 2, an error occurs:

              Code:
               permute cf_indicator T = _b[1.cf_indicator#X], reps(3000) nodots: ///     regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2)
              
              [1.cf_indicator#p_inv_cash] not found
              error in expression: _b[1.cf_indicator#p_inv_cash]
              r(111);

              5. When running the Nonpermutation method 1, it shows an error.
              Code:
               regress Y i.cf_indicator##c.(X controls) i.cf_indicator##(i.year sic2) test 1.cf_indicator#X
              
              varlist not allowed
              r(101);
              However, those above errors aren't in your example data so I've attached my data here for more information.


              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(Y X control1 control2 control3 cf_indicator)
               .10857251   .027367014  .1617778    .16916423  .28899306 1
              .004715193    .00342154 .07875443    .04344569          0 0
               .07356971    .00768116 .21203567    .03622579  1.5344946 1
               .24486415    .02626503  .3915742    .06707548  .03948495 0
                .0974177   .009679618 .06010582    .16104294  .25834543 1
               .12402222   .002993282 .24261387   .012337638   .5338913 0
                 .638522   .002508853 .26249257   .009557805  .09159508 1
               .17772473  .0005537187 .24296713  .0022789862  1.0631486 0
              .065786734   .007629359  .1522691    .05010445  1.3298148 1
                .4887764   .007368368  .5235301    .01407439  .19877225 0
               .07618244   .006024898 .14484248    .04159621  .02062019 1
                .2716101   .012333836  .1950324    .06323993  1.0343405 0
               .24213625   .033409104  .4756752    .07023512   .8017449 0
               .05152088   .008231385  .3285325    .02505501  1.6265423 0
                .3198293    .07551782 .26575565     .2841626 .018619418 0
              .005872167  .0020411615 .23348087   .008742307  .28517148 1
               .10852266    .08625486  .4487196     .1922244   .4391095 0
                .1723519   .015215768  .4720508    .03223333   .3995879 1
               .17524564    .07167494 .29243731    .24509507          0 0
                 .334016    .08365775 .47080365    .17769137          0 1
               .26722816    .12690866  .2339996    .54234564          0 0
                 .313495    .03201833  .4261253     .0751383   .4899221 0
              .004715193 .00009424942  .2056838 .00024708087  .29386455 1
              .023968374 .00021614583  .1628507  .0013272637   .8164231 1
                 .313495    .03201833  .4261253     .0751383   .5217251 0
                .8689613   .002538206 .20308894   .012498003  1.1689584 1
               .12837756   .069280066  .4425719     .1565397  .41106445 1
               .50597847     .0943788  .5852041      .161275  .12781169 1
                .3441456   .036102075  .4487038    .08045858   .3804414 0
               .05049255   .017573243  .3590992    .04893701  .08743084 1
               .09301658   .002797694 .24960037   .011208693   .6111767 1
               .08937198    .04295685  .4445101    .09663865  1.0428007 0
                .1787726  .0016331812  .5226063    .00312507    .745356 0
                .4063446    .03537859 .58808196    .06015928  1.3759187 1
                1.979627     .1165586  .4878553    .23892044   .4857574 0
               .17712164    .02500915  .4474563    .05589182   .5667625 0
               1.4503298     .0477997  .6210141     .0769704   .6133192 1
                .3251538    .10031226  .7133143    .14062841   .7327549 0
                .4702225   .011859408   .140448    .08443984  .14947967 0
                .2132593    .23411173  .5800263     .4036226          0 1
                 .248027     .1949663 .58043396     .3358975          0 1
               .07627075   .002505514  .2193786    .01142096          . 0
               .29370776  .0018260905 .53876173  .0033894214  .50912523 0
                2.079191     .0208887 .58241403   .035865724  1.6807666 1
               .21136297   .003736378  .5031057   .007426626   .6456264 1
               .13161702    .03820193 .57151175   .066843644  .00481922 1
                .4813593    .07807865 .53771377    .14520486 .007829719 0
               1.5364577   .008238988  .5049697   .016315807  1.5893137 1
               .03813507    .04468339  .6269451   .071271606  .22430553 1
               .11030934  .0024246236  .3847911   .006301142   .8458499 1
                .5206426     .5801843 .58758914     .9873978  .12108879 1
                .4771008    .07052253  .6096699     .1156733   .7563603 0
                .3057053   .016236434  .4456767   .036430966  .53288007 0
                2.079191     .4596539  .4968177     .9251963          0 1
                .5774904   .008867126  .5754652   .015408622   .2238851 0
                .2095313   .008551225  .5826758   .014675785   .9191899 0
                .2487967   .001497561 .53969705  .0027748176   .7886181 0
               .12231066    .02759775  .5842511    .04723611   .1880312 0
               .55900586   .005550019  .5972396   .009292785  1.7633833 1
                .3861424   .026005374  .4499177    .05780029   .3752838 0
                2.079191    .03263767  .5278719    .06182877   1.473944 0
                2.079191    .13133055 .58526325     .2243957   1.269677 1
                 .753144    .08223213  .5790282    .14201748  .50722235 0
              .036778104   .027954325  .4816376    .05804016   1.424169 1
               .17691883    .02595752  .4611311    .05629098   .7341017 0
                .4311565    .06300014 .51378274     .1226202  1.2756475 1
                .3100247    .04491757  .5508267    .08154574          0 0
                .1855146  .0043680207 .55826914    .00782422  1.2376318 1
               .11989193    .09456532   .370757    .25506008          0 1
               .19205087     .2068062  .5116492     .4041953   .6118686 1
                .4650525    .18868466  .5044149     .3740664          0 0
                .6884933   .012823677 .51965696   .024677197  .53486913 0
               .37108645  .0039139963   .558886   .007003211   .7731268 0
               1.0948026    .20888075  .5320679     .3925829  1.0697545 1
               .30136025   .016102951 .53142136    .03030166  .23610672 1
                2.079191    .04692367  .5961493    .07871129   1.328614 1
                .5839605    .02132849  .5291139    .04030983   .6367181 0
               .25997332    .04220337  .5615239    .07515863   .9653184 0
               .02022237    .01206726  .6421605    .01879166  1.0595987 1
                .3753801  .0008278707 .54682714  .0015139533  .06695514 0
               1.9198842    .06785603  .5546914    .12233113   .1781987 1
                 .448255    .06595232  .6572369    .10034786  .25942507 1
                .2431905  .0039614234  .5336711   .007422968  1.2180876 1
               .51940185    .01979476  .6040931    .03276773   .7177066 1
                1.291627  .0001211069  .4901509 .00024708087    .688301 0
                .4500455   .002560475  .5207855   .004916564    .241203 0
               .08850165      .160966 .58132917    .27689305          0 0
                1.071426  .0002020549  .5816807  .0003473639   1.478687 1
                .7582617  .0020804815 .29001713    .00717365  1.0956359 1
               .29327095  .0022241888   .595995   .003731892   .4437316 0
              .004715193    .11037994  .4840363    .22804064          0 1
                 .205585   .006279338 .50984406   .012316193   .6374559 0
                .1014176  .0011013633 .55327094    .00199064   .4670753 1
                .6393175   .003821763 .58930624   .006485191   .6376336 0
                .7717947      .238334 .59041506    .40367195  1.1068755 1
                .2156449    .18884975 .52937466     .3567412          0 1
                .0303877    .29199174 .59272385      .492627          0 1
               1.1754621    .23910648 .55291265     .4324489  .25955012 1
               .52287555    .09570356  .5977402    .16010897  .12721778 1
                .2822584     .1243819  .4683426     .2655789          0 1
              end
              ------------------ copy up to and including the previous line ------------------

              Listed 100 out of 49576 observations

              Thank you so much for your enlightenments!
              Last edited by Jae Li; 17 Jun 2024, 15:08.

              Comment


              • #8
                Originally posted by Jae Li View Post
                1. what does syntax varlist(numeric fv) mean?
                Refer to the help file.

                2. What is local macro `group' referring to? the number of variables in the varlist in the dataset? Can `group' equal to cf_indicator?
                Yes, the local macro holds the name of the cf_indicator variable.

                I notice that I neglected to remove that variable from the pass-through variable list fed to regress inside the program that I show above. As it happens, it doesn't matter because cf_indicator ends up collinear with the intercept (_cons), so that regress automatically drops it and it doesn't affect the results. But it's better programming practice to explicitly remove it as in the following.
                Code:
                program define chow, rclass
                    version 18.0
                    syntax varlist(numeric fv)
                
                    local group : word `:word count `varlist'' of `varlist'
                    local varlist : list varlist - group
                
                    regress `varlist' if `group'
                    tempname small
                    scalar define `small' = _b[X]
                
                    regress `varlist' if !`group'
                    return scalar T = _b[X] - `small'
                end
                3. When I run the below code, it shows an error. I tried to install it using -help chow-, but it seems have many options, which one shall I install? Also, shall I set the seed() or strata()?
                I'm not sure what you're doing here. I guess that there's also a user-written command chow? The program used above is created in the do-file.

                4. When running the Permutation method 2, an error occurs:
                It seems that you were using the variable name controls to refer to more than a single variable. Refer to them individually.

                5. When running the Nonpermutation method 1, it shows an error.

                However, those above errors aren't in your example data so I've attached my data here for more information.
                I think that the answer is the same as that for your fourth question immediately above.

                Comment


                • #9
                  Originally posted by Joseph Coveney View Post
                  . . . it's better programming practice to explicitly remove it as in the following.
                  More straightforward, though, is to include it as an option.
                  Code:
                  program define chow, rclass
                      version 18.0
                      syntax varlist(numeric fv), group(varname numeric)
                  
                      regress `varlist' if `group'
                      tempname small
                      scalar define `small' = _b[X]
                  
                      regress `varlist' if !`group'
                      return scalar T = _b[X] - `small'
                  end
                  
                  permute cf_indicator T = r(T), nodrop ///
                      reps(100) nodots: chow Y c.(X controls) i.(year sic2), group(cf_indicator)
                  I had tried this approach in one of the interim candidate setups, had trouble with it for a reason that I didn't understand at the time, and so prematurely abandoned it

                  Comment


                  • #10
                    @Joseph Coveney Hi Joseph, thank you so much for getting back to me!

                    When running the codes using the Permutation methods 1, 2 and in post #9, they all generate the same result at below. Do you possibly know how to interpret the result? It seems to be an error. Any ideas to fix it?


                    Code:
                    . permute cf_indicator T = r(T), nodrop /// 
                         reps(3000) nodots: chow Y c.(X control1 control2 control3) i.(year sic2) cf_indicator
                    
                    Monte Carlo permutation results                 Number of observations = 49,576
                    Permutation variable: cf_indicator              Number of permutations =  3,000
                    
                          Command: chow Y c.(X control1 control2 control3)  i.(year sic2) cf_indicator
                                T: r(T)
                    
                    -------------------------------------------------------------------------------
                                 |                                               Monte Carlo error
                                 |                                              -------------------
                               T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
                    -------------+-----------------------------------------------------------------
                               T | -.0419947      lower       0       0      .      .      .      .
                                 |                upper       0       0      .      .      .      .
                                 |            two-sided                      .      .      .      .
                    -------------------------------------------------------------------------------
                    Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
                           For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
                           For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.
                           Some permutations led to results with missing values.
                    
                    . 
                    end of do-file
                    When running the Non-permutation method 1, the result shows an error:

                    Code:
                    . test 1.cf_indicator#X
                    varlist not allowed
                    r(101);
                    When running the Non-permutation method 2, the results also shows an error:


                    Code:
                    . lincom [Avec_mean]X - [Sans_mean]X
                    weights not allowed
                    r(101);
                    I've attached the sample data in #7 in case you wanna test the data. Thank you so much for your help!

                    Comment


                    • #11
                      Originally posted by Jae Li View Post
                      I've attached the sample data in #7 in case you wanna test the data. Thank you so much for your help!
                      Please attach the entire dataset as a .dta file (Stata dataset file).

                      Comment


                      • #12
                        @Joseph Coveney Hi Joseph, sure, please see it attached here for your review: permutation_data.dta Many thanks to you! I look forward to hearing from you!

                        Comment


                        • #13
                          You have empty cells in the cross-classified categories involving cf_indicator and sic3. (Visible in the regression results table with Nonpermutation Method 1.)

                          It will be better to permute the outcome variable, Y, instead in order to avoid the solid row of red xs that you get with permute when trying to permute cf_indicator. I show how to do this below. (I rename the variables for brevity.) Complete do-file and its log file are attached.
                          Code:
                          version 18.0
                          
                          clear *
                          
                          // seedem
                          set seed 672708676
                          
                          use permutation_data
                          
                          rename control? co?
                          rename cf* cfi
                          foreach var of varlist _all {
                              if `=strlen("`var'")' > 3 {
                                  local new = substr("`var'", 1, 3)
                                  rename `var' `new'
                              }
                          }
                          rename Y out
                          rename X pre
                          
                          *
                          * Begin here
                          *
                          
                          // Permutation Method 1
                          program define chow, rclass
                              version 18.0
                              syntax varlist(numeric fv), group(varname numeric)
                          
                              regress `varlist' if `group'
                              tempname small
                              scalar define `small' = _b[pre]
                          
                              regress `varlist' if !`group'
                              return scalar T = _b[pre] - `small'
                          end
                          
                          permute out T = r(T), nodrop reps(100) nodots: ///
                              chow out c.(pre co?) i.(yea sic), group(cfi)
                          
                          // Permutation Method 2
                          permute out T = _b[1.cfi#pre], reps(100) nodots: ///
                              regress out i.cfi##c.(pre co?) i.cfi##i.(yea sic)
                          
                          // Nonpermutation Method 1
                          regress out i.cfi##c.(pre co?) i.cfi##i.(yea sic)
                          test 1.cfi#pre
                          
                          // Nonpermutation Method 2
                          regress out c.(pre co?) i.(yea sic) if cfi
                          estimates store Avec
                          
                          regress out c.(pre co?) i.(yea sic) if !cfi
                          estimates store Sans
                          
                          suest Avec Sans
                          lincom [Avec_mean]pre - [Sans_mean]pre
                          
                          exit
                          I limit the number of Monte Carlo permutations to 100—the P value is so large that it doesn't warrant anything more (Permutation Method 1, P = 0.7; Permutation Method 2, P = 0.4; Nonpermutation Method 1, P = 0.5; Nonpermutation Method 2, P = 0.7).
                          Attached Files

                          Comment


                          • #14
                            @Joseph Coveney Hi Joseph, I extremely appreciate your big help! It works perfectly fine now. Based on the obtained P values, I tried to calculate the t-statistics of the mean different T for documentation purpose and added the red-color codes, but there is an error. Do you possibly know how to fix it? Many thanks to you for your big help!

                            Code:
                             // Permutation Method 1
                            program define chow, rclass    
                            version 18.0     syntax varlist(numeric fv), group(varname numeric)      
                            regress `varlist' if `group'    
                            tempname small    
                            scalar define `small' = _b[pre]    
                            scalar define `small_se' = _se[pre]      
                            regress `varlist' if !`group'    
                            return scalar T = _b[pre] - `small'    
                            return scalar T_se = _se[pre] - `small_se'
                            end  
                             
                            permute out T = T/T_se, nodrop reps(100) nodots: ///     chow out c.(pre co?) i.(yea sic), group(cfi)  
                            
                            invalid syntax
                            an error occurred when permute executed chow
                            r(198);
                            Last edited by Jae Li; 22 Jun 2024, 06:01.

                            Comment


                            • #15
                              Originally posted by Jae Li View Post
                              I tried to calculate the t-statistics of the mean different T for documentation purpose and added the red-color codes, but there is an error. Do you possibly know how to fix it?
                              Subtracting regression coefficient standard errors as you do doesn't make sense to me. The modified expression list that you feed to permute seems to have a syntax error, too.

                              If you want the Wald test statistic for the difference between cf indicator groups of the X slope, then you can get it from one of the two nonpermutation methods that I show above in #13. Otherwise, you can bootstrap the difference (T) for an asymptotic standard error or confidence interval.

                              Comment

                              Working...
                              X