Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Save correlation results as variable

    Hello,
    I have several datasets with each 1500 observations and 30,000 variables. For each, I want to store the result of the correlation of the variable abc with all other variables as a new variable, that is correlation coefficient and p-value.
    So in the end I'd like the following:

    var1 var2 ... var30000 abc variable names no. correlation coefficient with abc pvalue
    var1 1
    var2 2
    ... ...
    var30000 30000
    So far I generated a variable (no) that just goes from 1 to 30,000 and then used this:

    local k=0
    foreach x of varlist var1- var30000 {
    spearman abc `x' , stats( p) pw
    local k=`k'+1
    replace var="`x'" if no==`k'
    replace corr=r(rho) if no==`k'
    replace pvalue=r(p) if no==`k'
    replace nobs=r(N) if no==`k'
    }

    But this takes too long.
    Is there some way to speed this up?
    It would be great if I could after each correlation save the variable name and correlation result next to each other. And for the next variable, Stata would save the name of the variable and the correlation simply in the next line. Then I would delete the original data (var1 to var 30,000). What I then could do is drop insignificant ones and search for highly correlated variables.


  • #2

    -postfile- can be used (see help postfile). Below is an example:
    Code:
    sysuse auto, clear
    
    local var "mpg"
    local covars "price weight length"
    
    tempfile results
    tempname fh
    
    postfile `fh' str32 varname rho p n using "`results'"
    
    qui foreach v of varlist `covars' { 
    
        spearman `var' `v' , stats( p) pw
        post `fh' ( "`v'" ) ( r(rho) ) ( r(p) )  ( r(N) )
    }
    
    postclose `fh'
    
    use "`results'"
    compress
    * save ...

    Comment


    • #3
      Thank you so much! You are a life saver. That works perfectly.

      Comment

      Working...
      X