Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • from variables to row vector - calculate distance btw vectors

    Hi all,
    I have a dataset like this:
    id type var1 var2 var3 var4
    1 1 0 1 2 3
    1 2 1 0 1 3
    2 1 0 0 1 2
    2 2 1 2 3 4

    [My actual dataset has var1, var2, var3, ... var599, var600]

    My final goal is to calculate the distance between two vectors identified by the "type" variable for each id. For example: for id=1 I want to calculate the distance between (0 1 2 3) and (1 0 1 3)
    The distance is the angle between the two vectors, or, second best option, the Euclidean distance.


    I thought that one way could be to create vectors from the "var*" variables for each combination "id" "type" . But I do not know how to do it.
    And maybe that is not the best way to approach the problem.
    Thank you in advance for any help you will give me
    Last edited by Federico Cav; 21 Mar 2017, 04:51.

  • #2
    So, this would be easier if you had column vectors instead of row vectors. -reshape- (twice!) to the rescue.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id type var1 var2 var3 var4)
    1 1 0 1 2 3
    1 2 1 0 1 3
    2 1 0 0 1 2
    2 2 1 2 3 4
    end
    
    
    //    VERIFY TWO OBS PER ID
    //    AND INLIST(TYPE, 1, 2)
    by id (type), sort: assert _N == 2 & type == _n
    
    //    CHANGE FROM ROW VECTORS TO COLUMN VECTORS
    reshape long var, i(id type) j(vnum)
    reshape wide var, i(id vnum) j(type)
    
    //    CALCULATE EUCLIDIAN DISTANCE &
    //    INNER PRODUCT
    by id, sort: gen dist = sum((var1-var2)^2)
    by id: replace dist = sqrt(dist[_N])
    by id: gen inner_product = sum(var1*var2)
    by id: replace inner_product = inner_product[_N]
    
    //    CALCULATE LENGTH OF EACH VECTOR
    forvalues i = 1/2 {
        by id: gen length`i' = sum(var`i'^2)
        by id: replace length`i' = sqrt(length`i'[_N])
    }
    
    //    CALCULATE ANGLE BETWEEN VECTORS
    gen angle12 = acos(inner_product/(length1*length2))
    Your approach of using Stata matrices could also be made to work, but I think the above code is simpler and I imagine it is more efficient as well.

    Comment


    • #3
      Thank you very much! Fast and clear! I will try it.
      Thanks again

      Comment


      • #4
        It worked. Your solution is definitely better than what I was thinking. One final question: why did you use the acos() to calculate the distance?
        Isn't the similarity measure = inner_product/(length1*length2) and the distance = 1- (inner_product/(length1*length2)) ?

        Thanks !
        ciao!

        Comment


        • #5
          The formula with acos() is for the angle between the two vectors. That's standard linear algebra: inner product of A with B = |A| |B| cos(angle). So solve for angle and you get precisely that formula. The distance (Euclidean) was given by the two lines of code that create the variable dist, and it is the standard Pythagorean formula.

          What you call here "similarity measure" = inner_product/(length1*length2) is the same as the cosine of the angle between the vectors. It is, indeed, one metric of similarity between them. But you didn't ask for that in your original post. You asked for the angle between two vectors, or the Euclidean distance. That's what the code in #2 gives you. In any case, it's easy enough for you to calculate the cosine of the angle from the variables calculated in #2.

          Comment

          Working...
          X