Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cross correlation in panel data set

    Dear Statalist users,

    I have a panel dataset with return data for 30 companies over 24 months. I need to calculate the cross-correlation between every pair of time-series data (e.g. between company 1 and company 2 over the 24 months and so on). I am no expert but I should need 435 correlation-coefficients right?

    The variable I am most interested in, is the mean of all coefficients. I have read the help manual but can´t find an easy approach to tackle this problem. Would you recommend reformating the data into wide format? Would you use forvalues to calculate all values? And how would you calculate the mean?

    Thank you.

    Alex

  • #2
    Yes, this is one of those uncommon situations where wide layout will facilitate things, because the -corr- command requires its arguments to be in separate variables. And, yes, you will need 435 correlations to do this.

    You do not say what you want to correlate across companies. Also, you do not show example data. But the following code using the online grunfeld.dta illustrates the approach. You can modify it to your actual situation just by changing the variable names, and if you are correlating more than one variable across companies, you will have to nest it inside a loop over those variables.

    Code:
    clear*
    webuse grunfeld
    
    keep company year mvalue
    levelsof company, local(companies)
    local n_companies: word count `companies'
    
    reshape wide mvalue , i(year) j(company)
    
    local wanted = 0
    forvalues i = 1/`n_companies' {
        forvalues j = `=`i'+1'/`n_companies' {
            corr mvalue`i' mvalue`j'
            local wanted = `wanted' + r(rho)
        }
    }
    
    display as text "Average cross-company correlation = " ///
        as result %05.3f =`wanted'/(0.5*`n_companies'*(`n_companies'+1))
    Note: This code assumes that you are only interested in the average correlation coefficient, so no effort is made to save the individual pair correlations along the way.

    As an aside, I will note that average correlations like this are usually of dubious value statistically. Not sure how you plan to use it. Just sayin'.

    Comment


    • #3
      I wouldn't call those cross-correlations. They are just the P(P-1)/2 correlations between P panel series.

      The Grunfeld data are 10 companies over 20 years. There may be a better way than this, but this works.

      Code:
      . webuse grunfeld, clear
      
      . keep company invest year
      
      . reshape wide invest, i(year) j(company)
      (j = 1 2 3 4 5 6 7 8 9 10)
      
      Data                               Long   ->   Wide
      -----------------------------------------------------------------------------
      Number of observations              200   ->   20          
      Number of variables                   3   ->   11          
      j variable (10 values)          company   ->   (dropped)
      xij variables:
                                       invest   ->   invest1 invest2 ... invest10
      -----------------------------------------------------------------------------
       
      . corr invest*
      (obs=20)
      
                   |  invest1  invest2  invest3  invest4  invest5  invest6  invest7  invest8  invest9 invest10
      -------------+------------------------------------------------------------------------------------------
           invest1 |   1.0000
           invest2 |   0.6322   1.0000
           invest3 |   0.8231   0.7767   1.0000
           invest4 |   0.8902   0.7352   0.7631   1.0000
           invest5 |   0.7540   0.7926   0.7474   0.8424   1.0000
           invest6 |   0.9302   0.7455   0.8426   0.9405   0.7940   1.0000
           invest7 |   0.8723   0.5991   0.7605   0.7947   0.7554   0.8409   1.0000
           invest8 |   0.8502   0.8351   0.9139   0.7877   0.8051   0.8285   0.8144   1.0000
           invest9 |   0.6745   0.5442   0.7043   0.6438   0.6176   0.6418   0.8101   0.7836   1.0000
          invest10 |   0.6973   0.7573   0.7669   0.7999   0.7496   0.8604   0.6370   0.7195   0.5071   1.0000
      
      . ret li
      
      scalars:
                        r(N) =  20
                      r(rho) =  .6321563888920835
      
      matrices:
                        r(C) :  10 x 10
      
      . mata :
      ------------------------------------------------- mata (type end to exit) --------------------------------------------------------------------
      : corr = vech(st_matrix("r(C)"))
      
      : corr = select(corr, corr :< 1)
      
      : corr
                        1
           +---------------+
         1 |  .6321563889  |
         2 |  .8231164512  |
         3 |  .8901535601  |
         4 |  .7539831907  |
         5 |  .9302220638  |
         6 |  .8722699093  |
         7 |  .8501530022  |
         8 |  .6744928456  |
         9 |  .6973431385  |
        10 |  .7766828078  |
        11 |   .735237833  |
        12 |  .7925663113  |
        13 |  .7454787296  |
        14 |  .5990954705  |
        15 |  .8351357306  |
        16 |  .5441934979  |
        17 |  .7573068648  |
        18 |  .7631030673  |
        19 |  .7474176986  |
        20 |  .8425762567  |
        21 |  .7604620944  |
        22 |  .9139182692  |
        23 |   .704320591  |
        24 |   .766947381  |
        25 |  .8423785522  |
        26 |  .9405411844  |
        27 |  .7946863794  |
        28 |  .7877142614  |
        29 |  .6437650392  |
        30 |  .7999414324  |
        31 |  .7939527046  |
        32 |  .7553907764  |
        33 |  .8050986933  |
        34 |  .6175799453  |
        35 |  .7496251561  |
        36 |  .8408720758  |
        37 |  .8285035939  |
        38 |  .6418491013  |
        39 |  .8604217532  |
        40 |  .8144305441  |
        41 |  .8100911216  |
        42 |  .6370341852  |
        43 |  .7836128206  |
        44 |  .7194957382  |
        45 |  .5070642493  |
           +---------------+
      
      :  mean(corr)
        .7640529436
      NOTE: Connoisseurs will note the onlap and offlap with #2.

      Recently I too raised an eyebrow about interest in mean correlation. As a positive I note that some people commend averaging on the atanh() scale and back transforming.

      FWIW, this is the distribution of the correlations above.
      Click image for larger version

Name:	corr.png
Views:	1
Size:	22.9 KB
ID:	1742561


      Last edited by Nick Cox; 08 Feb 2024, 12:01.

      Comment


      • #4
        I think Clyde has a small error in #2. The number of distinct correlations is (number of companies)(number of companies - 1) / 2

        Comment


        • #5
          Nick is correct in #4. My error. Sorry. And thanks to Nick for the correction.
          Last edited by Clyde Schechter; 08 Feb 2024, 12:06.

          Comment


          • #6
            Note that my code removing correlations of 1 will need modification of correlations of 1 ever occur between different variables. The code presumes that correlations of 1 only ever occur on the principal diagonal of the correlation matrix arising from correlation between each variable and itself.

            That is, the vech includes the principal diagonal:

            Code:
            . sysuse auto, clear
            (1978 automobile data)
            
            . corr mpg weight price
            (obs=74)
            
                         |      mpg   weight    price
            -------------+---------------------------
                     mpg |   1.0000
                  weight |  -0.8072   1.0000
                   price |  -0.4686   0.5386   1.0000
            
            
            . mata
            ------------------------------------------------- mata (type end to exit) -------------------------------
            : vech(st_matrix("r(C)"))
                              1
                +----------------+
              1 |             1  |
              2 |  -.8071748589  |
              3 |  -.4685966882  |
              4 |             1  |
              5 |   .5386114626  |
              6 |             1  |
                +----------------+
            and we don't want the values of 1 from that diagonal; but would 1 otherwise ever be observed?
            Last edited by Nick Cox; 09 Feb 2024, 06:35.

            Comment


            • #7
              Thank you to both of you! This helped a lot. Nick Cox I have also tried Clyde´s version (with the corrected formula) and both versions lead to the same result. So I think your version should be fine as there is no correlation of 1 in my data. Thanks again.

              Comment


              • #8
                Wanting the vech of a square matrix -- except not values from the principal diagonal -- must be a common problem, but I couldn't think of an easy way to do it except from first principles. Better solutions welcome! (Similarly, the name here is invented, and should be replaced by a standard name if there is one.)

                Code:
                mata :
                
                * elements from below the principal diagonal of a square matrix in a column vector 
                real matrix vecl(real matrix squaremat) { 
                    real scalar i, j, K  
                    real colvector out 
                    
                    if ((K = rows(squaremat)) != cols(squaremat)) {
                        "matrix not square"
                        _error(3205)
                    }
                    
                    out = J(0, 1, .)
                    
                    for(j = 1; j <= K; j++) {
                        for(i = j + 1; i <= K; i++) { 
                            out = out \ squaremat[i, j]
                        }
                    } 
                    
                    return(out)
                    
                }
                
                end

                Comment

                Working...
                X