Good evening,
For an N dimensional vector, I want to form an N X N matrix of indicators showing whether two elements of the vector share the same value. E.g.,
In the example the first value of the vector v of 1 is unique, so it shares a value only with itself; same for the second, the third and forth elements share the same value with themselves and with each other, resulting in the 2 X 2 square of 1s on the east-south end of R.
I managed to do this in Mata using a double loop.
I am wondering whether there is some vectorised clever way to do this, which results in faster computation that the double loop I am doing below?
The two versions of the doble loop follow. One is just a direct double loop from 1 to N; in the second version I tried to be clever and to cut down the loop in half, as the R matrix is symmetric. However this does not seem to save much execution time.
In terms of the auto data I am looking for the following result, but hopefully in some vectorised way:
For an N dimensional vector, I want to form an N X N matrix of indicators showing whether two elements of the vector share the same value. E.g.,
Code:
. mat v = (1,2,4,4) . mat list v v[1,4] c1 c2 c3 c4 r1 1 2 4 4 . mat R = (1,0,0,0 \ 0, 1 , 0, 0 \ 0 , 0, 1, 1 \ 0, 0, 1, 1) . matlist R | c1 c2 c3 c4 -------------+------------------------------------------- r1 | 1 r2 | 0 1 r3 | 0 0 1 r4 | 0 0 1 1
I managed to do this in Mata using a double loop.
I am wondering whether there is some vectorised clever way to do this, which results in faster computation that the double loop I am doing below?
The two versions of the doble loop follow. One is just a direct double loop from 1 to N; in the second version I tried to be clever and to cut down the loop in half, as the R matrix is symmetric. However this does not seem to save much execution time.
Code:
sysuse auto expand 400 replace rep = 6 if missing(rep) putmata N = rep timer on 1 mata: obs = rows(N) R = I(obs) for (i=1; i<=obs; i++) { for (j=1; j<i; j++) { R[i,j]=N[i]==N[j] } } R = R + R' - I(obs) end timer off 1 timer on 2 mata: obs = rows(N) Rcorrect = I(obs) for (i=1; i<=obs; i++) { for (j=1; j<=obs; j++) { Rcorrect[i,j] = N[i]==N[j] } } end timer off 2 mata: mreldif(Rcorrect, R) timer list
Code:
. sysuse auto (1978 automobile data) . . . keep in 1/13 (61 observations deleted) . . replace rep = 6 if missing(rep) (2 real changes made) . . putmata N = rep (1 vector posted) . . . timer on 2 . mata: ------------------------------------------------- mata (type end to exit) ------------------------------------------------------------------------------------------------------ : obs = rows(N) : Rcorrect = I(obs) : for (i=1; i<=obs; i++) { > for (j=1; j<=obs; j++) { > Rcorrect[i,j] = N[i]==N[j] > } > } : N 1 +-----+ 1 | 3 | 2 | 3 | 3 | 6 | 4 | 3 | 5 | 4 | 6 | 3 | 7 | 6 | 8 | 3 | 9 | 3 | 10 | 3 | 11 | 3 | 12 | 2 | 13 | 3 | +-----+ : Rcorrect [symmetric] 1 2 3 4 5 6 7 8 9 10 11 12 13 +------------------------------------------------------------------+ 1 | 1 | 2 | 1 1 | 3 | 0 0 1 | 4 | 1 1 0 1 | 5 | 0 0 0 0 1 | 6 | 1 1 0 1 0 1 | 7 | 0 0 1 0 0 0 1 | 8 | 1 1 0 1 0 1 0 1 | 9 | 1 1 0 1 0 1 0 1 1 | 10 | 1 1 0 1 0 1 0 1 1 1 | 11 | 1 1 0 1 0 1 0 1 1 1 1 | 12 | 0 0 0 0 0 0 0 0 0 0 0 1 | 13 | 1 1 0 1 0 1 0 1 1 1 1 0 1 | +------------------------------------------------------------------+ : end ------------------
Comment