Cross correlation in panel data set

Alex Kraemer

Join Date: Feb 2024

Posts: 2
#1

Cross correlation in panel data set

08 Feb 2024, 10:17

Dear Statalist users,

I have a panel dataset with return data for 30 companies over 24 months. I need to calculate the cross-correlation between every pair of time-series data (e.g. between company 1 and company 2 over the 24 months and so on). I am no expert but I should need 435 correlation-coefficients right?

The variable I am most interested in, is the mean of all coefficients. I have read the help manual but can´t find an easy approach to tackle this problem. Would you recommend reformating the data into wide format? Would you use forvalues to calculate all values? And how would you calculate the mean?

Thank you.

Alex
Tags: panel data
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#2

08 Feb 2024, 10:40

Yes, this is one of those uncommon situations where wide layout will facilitate things, because the -corr- command requires its arguments to be in separate variables. And, yes, you will need 435 correlations to do this.

You do not say what you want to correlate across companies. Also, you do not show example data. But the following code using the online grunfeld.dta illustrates the approach. You can modify it to your actual situation just by changing the variable names, and if you are correlating more than one variable across companies, you will have to nest it inside a loop over those variables.

Code:

clear* webuse grunfeld keep company year mvalue levelsof company, local(companies) local n_companies: word count `companies' reshape wide mvalue , i(year) j(company) local wanted = 0 forvalues i = 1/`n_companies' { forvalues j = `=`i'+1'/`n_companies' { corr mvalue`i' mvalue`j' local wanted = `wanted' + r(rho) } } display as text "Average cross-company correlation = " /// as result %05.3f =`wanted'/(0.5*`n_companies'*(`n_companies'+1))

Note: This code assumes that you are only interested in the average correlation coefficient, so no effort is made to save the individual pair correlations along the way.

As an aside, I will note that average correlations like this are usually of dubious value statistically. Not sure how you plan to use it. Just sayin'.
2 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35431

08 Feb 2024, 10:50

I wouldn't call those cross-correlations. They are just the P(P-1)/2 correlations between P panel series.

The Grunfeld data are 10 companies over 20 years. There may be a better way than this, but this works.

Code:

. webuse grunfeld, clear

. keep company invest year

. reshape wide invest, i(year) j(company)
(j = 1 2 3 4 5 6 7 8 9 10)

Data                               Long   ->   Wide
-----------------------------------------------------------------------------
Number of observations              200   ->   20          
Number of variables                   3   ->   11          
j variable (10 values)          company   ->   (dropped)
xij variables:
                                 invest   ->   invest1 invest2 ... invest10
-----------------------------------------------------------------------------
 
. corr invest*
(obs=20)

             |  invest1  invest2  invest3  invest4  invest5  invest6  invest7  invest8  invest9 invest10
-------------+------------------------------------------------------------------------------------------
     invest1 |   1.0000
     invest2 |   0.6322   1.0000
     invest3 |   0.8231   0.7767   1.0000
     invest4 |   0.8902   0.7352   0.7631   1.0000
     invest5 |   0.7540   0.7926   0.7474   0.8424   1.0000
     invest6 |   0.9302   0.7455   0.8426   0.9405   0.7940   1.0000
     invest7 |   0.8723   0.5991   0.7605   0.7947   0.7554   0.8409   1.0000
     invest8 |   0.8502   0.8351   0.9139   0.7877   0.8051   0.8285   0.8144   1.0000
     invest9 |   0.6745   0.5442   0.7043   0.6438   0.6176   0.6418   0.8101   0.7836   1.0000
    invest10 |   0.6973   0.7573   0.7669   0.7999   0.7496   0.8604   0.6370   0.7195   0.5071   1.0000

. ret li

scalars:
                  r(N) =  20
                r(rho) =  .6321563888920835

matrices:
                  r(C) :  10 x 10

. mata :
------------------------------------------------- mata (type end to exit) --------------------------------------------------------------------
: corr = vech(st_matrix("r(C)"))

: corr = select(corr, corr :< 1)

: corr
                  1
     +---------------+
   1 |  .6321563889  |
   2 |  .8231164512  |
   3 |  .8901535601  |
   4 |  .7539831907  |
   5 |  .9302220638  |
   6 |  .8722699093  |
   7 |  .8501530022  |
   8 |  .6744928456  |
   9 |  .6973431385  |
  10 |  .7766828078  |
  11 |   .735237833  |
  12 |  .7925663113  |
  13 |  .7454787296  |
  14 |  .5990954705  |
  15 |  .8351357306  |
  16 |  .5441934979  |
  17 |  .7573068648  |
  18 |  .7631030673  |
  19 |  .7474176986  |
  20 |  .8425762567  |
  21 |  .7604620944  |
  22 |  .9139182692  |
  23 |   .704320591  |
  24 |   .766947381  |
  25 |  .8423785522  |
  26 |  .9405411844  |
  27 |  .7946863794  |
  28 |  .7877142614  |
  29 |  .6437650392  |
  30 |  .7999414324  |
  31 |  .7939527046  |
  32 |  .7553907764  |
  33 |  .8050986933  |
  34 |  .6175799453  |
  35 |  .7496251561  |
  36 |  .8408720758  |
  37 |  .8285035939  |
  38 |  .6418491013  |
  39 |  .8604217532  |
  40 |  .8144305441  |
  41 |  .8100911216  |
  42 |  .6370341852  |
  43 |  .7836128206  |
  44 |  .7194957382  |
  45 |  .5070642493  |
     +---------------+

:  mean(corr)
  .7640529436

NOTE: Connoisseurs will note the onlap and offlap with #2.

Recently I too raised an eyebrow about interest in mean correlation. As a positive I note that some people commend averaging on the atanh() scale and back transforming.

FWIW, this is the distribution of the correlations above.

Click image for larger version

Name: corr.png
Views: 1
Size: 22.9 KB
ID: 1742561

Last edited by Nick Cox; 08 Feb 2024, 11:01.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35431
#4

08 Feb 2024, 11:03

I think Clyde has a small error in #2. The number of distinct correlations is (number of companies)(number of companies - 1) / 2
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#5

08 Feb 2024, 11:04

Nick is correct in #4. My error. Sorry. And thanks to Nick for the correction.

Last edited by Clyde Schechter; 08 Feb 2024, 11:06.
2 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35431

09 Feb 2024, 05:31

Note that my code removing correlations of 1 will need modification of correlations of 1 ever occur between different variables. The code presumes that correlations of 1 only ever occur on the principal diagonal of the correlation matrix arising from correlation between each variable and itself.

That is, the vech includes the principal diagonal:

Code:

. sysuse auto, clear
(1978 automobile data)

. corr mpg weight price
(obs=74)

             |      mpg   weight    price
-------------+---------------------------
         mpg |   1.0000
      weight |  -0.8072   1.0000
       price |  -0.4686   0.5386   1.0000


. mata
------------------------------------------------- mata (type end to exit) -------------------------------
: vech(st_matrix("r(C)"))
                  1
    +----------------+
  1 |             1  |
  2 |  -.8071748589  |
  3 |  -.4685966882  |
  4 |             1  |
  5 |   .5386114626  |
  6 |             1  |
    +----------------+

and we don't want the values of 1 from that diagonal; but would 1 otherwise ever be observed?

Last edited by Nick Cox; 09 Feb 2024, 05:35.

Comment

Alex Kraemer

Join Date: Feb 2024

Posts: 2
#7

10 Feb 2024, 11:17

Thank you to both of you! This helped a lot. Nick Cox I have also tried Clyde´s version (with the corrected formula) and both versions lead to the same result. So I think your version should be fine as there is no correlation of 1 in my data. Thanks again.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35431

11 Feb 2024, 07:57

Wanting the vech of a square matrix -- except not values from the principal diagonal -- must be a common problem, but I couldn't think of an easy way to do it except from first principles. Better solutions welcome! (Similarly, the name here is invented, and should be replaced by a standard name if there is one.)

Code:

mata :

* elements from below the principal diagonal of a square matrix in a column vector 
real matrix vecl(real matrix squaremat) { 
    real scalar i, j, K  
    real colvector out 
    
    if ((K = rows(squaremat)) != cols(squaremat)) {
        "matrix not square"
        _error(3205)
    }
    
    out = J(0, 1, .)
    
    for(j = 1; j <= K; j++) {
        for(i = j + 1; i <= K; i++) { 
            out = out \ squaremat[i, j]
        }
    } 
    
    return(out)
    
}

end

Announcement