Trouble generating random, *large* correlation matrices

Leah Bevis

Join Date: Oct 2015

Posts: 125
#1

Trouble generating random, *large* correlation matrices

21 May 2018, 17:13

Hi all,

I've written an .ado file to randomly generate a correlation matrix, in order to draw correlated vectors using drawnorm. (I looked for existing code doing this, but couldn't find anything.) I realize that for just a few vectors, it's easy to simply write in a made-up correlation matrix. But I would like to draw about 100 vectors, each standard normals, and all correlated with one another in a random fashion. It turns out that this task is much more difficult than I thought.

I have posted the code below, first for the code that uses my randcorr function, and then for randcorr.ado, which is basically just the mata sub-routine. And it works, for small correlation matrices... up to about 40x40. Diagonals are all 1, and correlation coefficients (off-diags) fall between -0.4 and 0.4, drawn from a normal distribution. But here is the problem: certain, randomly drawn correlation matrices will NOT be positive semi-definite, even though they are symmetric, full rank, and have all off-diagonal values between -1 and 1. They look like correlation matrixes, but they are not.

I dealt with this problem by checking the eigenvalues, after the construction of the correlation matrix, within the mata sub-routine of my .ado file. If the smallest eigenvalue is less than 0, I draw a new correlation matrix, so that the final correlation matrix handed back to randcorr will be PSD. That works... but as the matrix gets bigger (my parameter N) or as the vectors become more collinear (my parameter s, set in the code to be 0.1), the mata sub-routine has to draw more and more matrices before it finds a PSD one.

So, if you run the code below with N=10 through N=40, it runs quite quickly. If you run the code w/ N=43 it takes a bit, and N=45 goes forever. I suspect the non-PSD problem relates to the transitiveness of correlations... the 43rd correlation coefficient is surely defined largely by the 1st-42rd correlation coefficients, in a way that I am not modeling.

I'm wondering 2 things. (1) Has somebody else already written such a user command, such that I can give up my own effort on the matter? (2) If not, does anyone have ideas on how I can better generate the random correlation matrix, either in terms of better values (more mathematically correct) or just faster (so that I can more quickly go through all the non-PSD matrices).

Thanks! Code below, and randcorr.ado below that.

Code:

clear clear all set more off set matsize 10000 local N = 10 /* set number of IVs / size of corr matrix */ matrix M = J(1,`N',0) /* vector of means (all IVs centered at 0) */ randcorr `N' .1 /* correlation matrix generated by randcorr.ado */ global Z z1 /* global for all IVs to be drawn */ forval i = 2/`N' { global Z $Z z`i' } ** Draw the N IVs, according to correlation matrix and mean vector drawnorm $Z , n(1000) corr(corr) means(M)

Code:

program randcorr version 14.2 args n s mata: myfunction(`n',`s') end version 14.2 mata: void myfunction(n,s) { iok=0 while (iok==0) { corr = diag(J(1,n,1)) for (j=1; j<=n; j++) { for (i=j+1; i<=n; i++) { corr[i,j]= rnormal(1,1,0,s) corr[j,i]= corr[i,j]; } } rnk = rank(corr) eigenv =symeigenvalues(corr) imn = minmax(eigenv)[1,1] if (rnk ==n & imn>0) { iok=1 } } st_matrix("corr", corr) } end
Tags: None
Leah Bevis

Join Date: Oct 2015

Posts: 125
#2

22 May 2018, 12:41

Update: after posting, I found this thread here addressing the same question. All methods for this seem to come out of Lewandowski, Kurowicka, and Joe (2009). So I'll lean on their paper, and perhaps previous python code, to tackle this problem more carefully.
Comment

Rafal Raciborski (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 83

23 May 2018, 11:24

Adopting MATLAB code to Mata is usually easy; here is the Mata version of vineBeta() from the linked StackExchange thread.

Code:

real matrix vineBeta(n, b)
{

    real matrix P, S, p
    real scalar i, k, m

    P = J(n,n,0)
    S = I(n)
    
    for (k=1; k<n; k++) {
        for (i=k+1; i<=n; i++) {
            P[k,i] = rbeta(1,1,b,b)
            P[k,i] = (P[k,i]-0.5)*2
            p = P[k,i];
            for (m=k-1; m>=1; m--) {
                p = p * sqrt((1-P[m,i]^2)*(1-P[m,k]^2)) + P[m,i]*P[m,k];
            }
            S[k,i] = S[i,k] = p
        }
    }
    
    // implement randperm() by hand
    p = runiform(n,1) , (1::n)
    _sort(p,1)
    p = p[.,2]
    S = S[p,p]
    
    return(S)
}

Last edited by Rafal Raciborski (StataCorp); 23 May 2018, 11:34.

Comment

Leah Bevis

Join Date: Oct 2015
Posts: 125

25 May 2018, 12:07

Hi -- thank you so much for doing the translation, this works so well! I will figure out, later, how to map the correlation matrix into a pretty square figure like the ones in the stack exchange forum. But for now, it's clear that the matrices are always PSD, and the correlation coefficients can be easily controlled by the parameter b. I've posted the ado file I made below. I'm not sure if it's written in a standard way (this is my first time writing an ado file), but it works!

Code:

program randcorr
version 14.2
    args n b 
    mata: vineBeta(`n',`b')

end

version 14.2
mata: 
void vineBeta(n, b) {

    P = J(n,n,0)
    S = I(n)
    
    for (k=1; k<n; k++) {
        for (i=k+1; i<=n; i++) {
            P[k,i] = rbeta(1,1,b,b)
            P[k,i] = (P[k,i]-0.5)*2
            p = P[k,i];
            for (m=k-1; m>=1; m--) {
                p = p * sqrt((1-P[m,i]^2)*(1-P[m,k]^2)) + P[m,i]*P[m,k];
            }
            S[k,i] = S[i,k] = p
        }
    }
    
    // implement randperm() by hand
    p = runiform(n,1) , (1::n)
    _sort(p,1)
    p = p[.,2]
    S = S[p,p]

    st_matrix("S", S)
    
}
end

Announcement