Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computing correlation between variables for each point in time

    Hello

    I have a set of time-series variables and I need to compute the correlation coefficients (rho) between them for each point in time tn, using only the information available at that time tn. Giving an exemple: if my data base had two variables, x1 and x2, I need to compute rho for t1, t2, t3,...,tn,..., tN.

    I used a loop with forvalues:

    forvalues t=1/148 { //N=148 is my sample size

    correl x1 x2

    }

    This allowed me to get N=148 correlation matrices. My plan is to use the scalars r(rho) from each of them to construct a variable that gives me the correlation coefficient between x1 and x2 over time. However, as my sample is going to increase (because new information becomes available), I don't want to fix the amount of times the loop runs (currently, 148). I want it to increase as the sample size increases.

    Is there any way of using a loop to create this variable? Or any other means of doing it?

    Thank you very much in advance!

  • #2
    No need for an explicit loop:

    Code:
    by time, sort: corr x1 x2
    Note: The code you show in #1 would simply calculate the correlation of x1 and x2 in the entire data set 148 times because the command inside the loop makes no reference to the loop parameter t. Also, this code will calculate the correlations and show them in the Results window (and in your log file), but it will not store them in the data set, nor as matrices. If you need to do the latter, then a different approach is needed. Post back if that is the case.

    Evidently, replace "time" in the above by the actual name of your time variable. In the future, when you are asking for help with coding, you are better positioned to get a timely and helpful response if you post an example of your Stata data using the -dataex- command. The desired code typically depends on details of the data set that are not given by a description of the data in words. For example, in this case, I guessed that you have a variable named time, but you did not say that, and I can imagine some very inconvenient and different organizations of your data where that would not be true and an entirely different approach would be needed.

    Comment


    • #3
      Hello. Thank you for the quick answer, but it doesn't work. Ir I use the 'by' code Stata only tries to compute the correlation between x1 and x2 row by row, i.e. using only one observation.

      My data is ordered as follows, as time series variables:

      input float time double(X1 X2)
      1 -1.192678458416566 -.6659349804284389
      2 2.8046360849786396 1.6469787197841868
      3 4.186455544323053 2.5228397660568547
      4 7.367336016300513 3.6589951982649964
      5 5.812628109372781 2.8470329971519703
      6 6.491914738617396 3.4067227498313883
      7 6.632546950365722 3.0366386965479535
      8 7.483404992022329 2.6154360636745366
      ...

      I want to compute correlation between X1 and X2 at each point in time and store it in a new variable. For now I just wanted to generate N correlation matrices, each one using a different number of observations. The first would use 2 obs, the second 3 obs and so on.

      Currently, my code is the following:

      sum X1
      local dim=r(N) //To obtain the maximum number of iterations

      forvalues t=1/`dim' {

      set more off

      local obs=_n+1

      correl X1 X2 in 1/`obs'

      }

      It only gives me N=148 equal matrices, all of them using the same first 2 observations. This is because the local obs does not vary.

      So, basically I want that at each cycle the command correl uses one more observation.

      Thank you in advance.

      Comment


      • #4
        I have already found a way to do this. Thank you!

        Comment


        • #5
          Looks like you want correlations on a recursive window. You can do that with rangestat (from SSC):

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float time double(X1 X2)
          1 -1.192678458416566 -.6659349804284389
          2 2.8046360849786396 1.6469787197841868
          3  4.186455544323053 2.5228397660568547
          4  7.367336016300513 3.6589951982649964
          5  5.812628109372781 2.8470329971519703
          6  6.491914738617396 3.4067227498313883
          7  6.632546950365722 3.0366386965479535
          8  7.483404992022329 2.6154360636745366
          end
          
          rangestat (corr) X1 X2, interval(time . time)
          
          * spot check 2 cases
          list
          corr X1 X2 in 1/7
          corr X1 X2 in 1/8

          Comment


          • #6
            I want to compute correlation between X1 and X2 at each point in time
            That's a very confusing way of putting it. To me it does not evoke anything at all like your fuller description in #3. That's why the code I suggested was unsuitable: it solved a different problem from what you wanted and it assumed data that was not at all like what you actually have. This is why it is so important to follow the advice in the FAQ. In particular, if you want code and you don't show an example of your data, there is a high probability that what you get back will be off point. So, moral of the story learned and point made.

            Now to a solution to your problem. What you want is a running correlation. The simplest way to code this is:

            Code:
            clear
            input float time double(X1 X2)
            1 -1.192678458416566 -.6659349804284389
            2 2.8046360849786396 1.6469787197841868
            3  4.186455544323053 2.5228397660568547
            4  7.367336016300513 3.6589951982649964
            5  5.812628109372781 2.8470329971519703
            6  6.491914738617396 3.4067227498313883
            7  6.632546950365722 3.0366386965479535
            8  7.483404992022329 2.6154360636745366
            end
            
            gen start = 1
            gen end = time
            
            capture program drop one_corr
            program define one_corr
                corr X1 X2
                gen rho = r(rho)
                exit
            end
            
            rangerun one_corr, interval(time start end)
            Note: this code will not display the correlation results on the screen. Instead, it places them in a new variable, rho, in the data set. If you want to see them on the screen, add the -verbose- operation to the -rangerun- command. Also, -rangerun- is written by Robert Picard and is available from SSC. If you don't have it, you will need to get it, and you will also need to get -rangestat- (Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC), as -rangerun- calls it.

            Added: Crossed with #5, which makes the same suggestion, though it codes the -interval()- option more succinctly (but less transparently).

            Comment

            Working...
            X