I'm interested in a set of problems in which the data from each individual is a vector of integer-valued responses, and I want to know the number of times that each distinct pair of individuals gives responses that agree with each other. Thus, I want an agreement matrix A, where A[i,j] contains the number of responses on which subject i agrees with (is identical to) subject j. It's like a similarity matrix, where similarity is the number of agreements across the response vector. I could not figure out a way to do this with - matrix dissimilarity-, which seems to implement a matching option only for binary data. I found no reasonable way to create this agreement matrix in Stata per se, and I need to work with it in Mata anyway.
The Mata code below is the best I came up with for speed. I'm interested in any suggestions or critique. Avoiding looping over measurements certainly helps, and it would be nice to avoid explicitly looping over observations as well.
The Mata code below is the best I came up with for speed. I'm interested in any suggestions or critique. Avoiding looping over measurements certainly helps, and it would be nice to avoid explicitly looping over observations as well.
Code:
// Simulate data set seed 1243 mata mata clear local nmeas = 15 // number of measurements local top = 4 // measurements go from 1, ... top local nobs = 500 local nreps = 50 // for timing only mata: X = ceil(`top ' * runiform(`nobs', `nmeas')) // // Count up agreements mata: N = rows(X) A = J(N, N, .) timer_clear(1) timer_on(1) for (reps = 1; reps <= `nreps' ; reps++) { // outer loop just for timing for (i = 1; i <= N - 1; i++) { for (j = i+1; j <= N; j++) { A[i,j] = rowsum(X[i,.] :== X[j,.]) // avoid loop over measurements } } } timer_off(1) timer() end
Comment