Hi all,
I've written an .ado file to randomly generate a correlation matrix, in order to draw correlated vectors using drawnorm. (I looked for existing code doing this, but couldn't find anything.) I realize that for just a few vectors, it's easy to simply write in a made-up correlation matrix. But I would like to draw about 100 vectors, each standard normals, and all correlated with one another in a random fashion. It turns out that this task is much more difficult than I thought.
I have posted the code below, first for the code that uses my randcorr function, and then for randcorr.ado, which is basically just the mata sub-routine. And it works, for small correlation matrices... up to about 40x40. Diagonals are all 1, and correlation coefficients (off-diags) fall between -0.4 and 0.4, drawn from a normal distribution. But here is the problem: certain, randomly drawn correlation matrices will NOT be positive semi-definite, even though they are symmetric, full rank, and have all off-diagonal values between -1 and 1. They look like correlation matrixes, but they are not.
I dealt with this problem by checking the eigenvalues, after the construction of the correlation matrix, within the mata sub-routine of my .ado file. If the smallest eigenvalue is less than 0, I draw a new correlation matrix, so that the final correlation matrix handed back to randcorr will be PSD. That works... but as the matrix gets bigger (my parameter N) or as the vectors become more collinear (my parameter s, set in the code to be 0.1), the mata sub-routine has to draw more and more matrices before it finds a PSD one.
So, if you run the code below with N=10 through N=40, it runs quite quickly. If you run the code w/ N=43 it takes a bit, and N=45 goes forever. I suspect the non-PSD problem relates to the transitiveness of correlations... the 43rd correlation coefficient is surely defined largely by the 1st-42rd correlation coefficients, in a way that I am not modeling.
I'm wondering 2 things. (1) Has somebody else already written such a user command, such that I can give up my own effort on the matter? (2) If not, does anyone have ideas on how I can better generate the random correlation matrix, either in terms of better values (more mathematically correct) or just faster (so that I can more quickly go through all the non-PSD matrices).
Thanks! Code below, and randcorr.ado below that.
I've written an .ado file to randomly generate a correlation matrix, in order to draw correlated vectors using drawnorm. (I looked for existing code doing this, but couldn't find anything.) I realize that for just a few vectors, it's easy to simply write in a made-up correlation matrix. But I would like to draw about 100 vectors, each standard normals, and all correlated with one another in a random fashion. It turns out that this task is much more difficult than I thought.
I have posted the code below, first for the code that uses my randcorr function, and then for randcorr.ado, which is basically just the mata sub-routine. And it works, for small correlation matrices... up to about 40x40. Diagonals are all 1, and correlation coefficients (off-diags) fall between -0.4 and 0.4, drawn from a normal distribution. But here is the problem: certain, randomly drawn correlation matrices will NOT be positive semi-definite, even though they are symmetric, full rank, and have all off-diagonal values between -1 and 1. They look like correlation matrixes, but they are not.
I dealt with this problem by checking the eigenvalues, after the construction of the correlation matrix, within the mata sub-routine of my .ado file. If the smallest eigenvalue is less than 0, I draw a new correlation matrix, so that the final correlation matrix handed back to randcorr will be PSD. That works... but as the matrix gets bigger (my parameter N) or as the vectors become more collinear (my parameter s, set in the code to be 0.1), the mata sub-routine has to draw more and more matrices before it finds a PSD one.
So, if you run the code below with N=10 through N=40, it runs quite quickly. If you run the code w/ N=43 it takes a bit, and N=45 goes forever. I suspect the non-PSD problem relates to the transitiveness of correlations... the 43rd correlation coefficient is surely defined largely by the 1st-42rd correlation coefficients, in a way that I am not modeling.
I'm wondering 2 things. (1) Has somebody else already written such a user command, such that I can give up my own effort on the matter? (2) If not, does anyone have ideas on how I can better generate the random correlation matrix, either in terms of better values (more mathematically correct) or just faster (so that I can more quickly go through all the non-PSD matrices).
Thanks! Code below, and randcorr.ado below that.
Code:
clear clear all set more off set matsize 10000 local N = 10 /* set number of IVs / size of corr matrix */ matrix M = J(1,`N',0) /* vector of means (all IVs centered at 0) */ randcorr `N' .1 /* correlation matrix generated by randcorr.ado */ global Z z1 /* global for all IVs to be drawn */ forval i = 2/`N' { global Z $Z z`i' } ** Draw the N IVs, according to correlation matrix and mean vector drawnorm $Z , n(1000) corr(corr) means(M)
Code:
program randcorr version 14.2 args n s mata: myfunction(`n',`s') end version 14.2 mata: void myfunction(n,s) { iok=0 while (iok==0) { corr = diag(J(1,n,1)) for (j=1; j<=n; j++) { for (i=j+1; i<=n; i++) { corr[i,j]= rnormal(1,1,0,s) corr[j,i]= corr[i,j]; } } rnk = rank(corr) eigenv =symeigenvalues(corr) imn = minmax(eigenv)[1,1] if (rnk ==n & imn>0) { iok=1 } } st_matrix("corr", corr) } end
Comment