Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Translating Pearsons Correlation Matrix to Distance Matrix using Matrix Dissimilarity?

    Hello everyone!

    To give some context: I am trying to reproduce the "Schwartz Value Model", which in my Data (European Social Survey Round 9) consists of 21 quasi-metric Items that make up ten "Basic Human Values". The items are supposed to be arranged in a quasi-circular model, that can be divided into ten pieces that correspond with the Basic Human Values. The model is calculated by multidimensional scaling.

    I read that the mdsmat will be "based upon the pearsons correlation matrix for the items", or that I have to "derive the distance matrix from the correlations of the items". I wondered: Is that what Matrix Dissimilarity does? Because I found an article that claimed that a distance matrix can be calculated by subtracting the correlation matrix from 1, yet the values I get from Matrix dissimilarity are not equal to that. Since I was unsure whether to use the mean centered values of the items or the uncentered items I ran both varlists (one for centered, one for uncentered) through Matrix dissimilarity and got the same results. I am unsure why that is, even though the Pearsons correlation matrix of both (correlate) varlists are unique. Can someone please explain whether there is a logical reason I am not grasping or if I need to somehow manually create my distance matrix for those 21 items? If so, how would I do that?

    Thank you so much in advance!
    Dan

  • #2
    I think one conceptual issue here is that there are many ways to calculate a distance matrix, not just one way. That is why -mdsmat- leaves it up to you to decide how to calculate your own distance matrix and why -matrix dissimilarity- gives you a number of options to measure distances between objects. You haven't clearly defined what kind of distance metric you want to use here and there are many possible distance metrics. Basically, you have a few general tools that you'd like to apply to a very specific task.

    The model is calculated by multidimensional scaling.
    Multidimensional scaling is a dimension reduction technique, not really a way to estimate a statistical model, and it feels a bit like you are conflating statistical modeling with a conceptual model here. I think I know what you mean, but the language you use was a little confusing for me.

    I found an article that claimed that a distance matrix can be calculated by subtracting the correlation matrix from 1
    When two variable have a positive correlation, we might say they are relatively close, and when two variables have a negative correlation, we can say they are relatively far apart. When you subtract the correlation from 1, you are essentially rescaling the correlation so that its values lie between 0 and 2. In the new matrix, perfectly positively correlated variables will have a distance of 0, perfectly uncorrelated variables will have a distance of 1, and perfectly negatively correlated variables have a distance of 2. If this seems appropriate to you then you don't need -matrix dissimilarity-, which can give you lots of different kinds of distances. Just use 1 - the correlation matrix as input.

    You aren't really clear about what objects you want to find the distance between. The above makes me think you want to find distances between variables, but note that it is possible to do the same thing with observations. See the -mds- command for multi-dimensional scaling for distances between observations.

    Since I was unsure whether to use the mean centered values of the items or the uncentered items I ran both varlists (one for centered, one for uncentered) through Matrix dissimilarity and got the same results. I am unsure why that is, even though the Pearsons correlation matrix of both (correlate) varlists are unique.
    The explanation for this depends on the details of what you did, but remember, there are lots of ways to calculate a dissimilarity matrix and -matrix dissimilarity- gives you many options. Just because the correlation changes when you mean center doesn't necessarily mean another distance measure will.

    Comment


    • #3
      Daniel Schaefer

      Thank you so much! I am a bit of a Newbie with MDS (as you figured) and appreciate your insight greatly.

      I think one conceptual issue here is that there are many ways to calculate a distance matrix, not just one way.
      I understand that -matrix dissimilarity- has many distance metrics, sadly I am yet unable to find out which one to use from the literature I was given. I will look into this some more.

      If this seems appropriate to you then you don't need -matrix dissimilarity-, which can give you lots of different kinds of distances. Just use 1 - the correlation matrix as input.
      That seems to be exactly what I need from where I'm at right now. How would i use 1 - correlation matrix as input? I cannot seem to find a way to input it. I tried:

      Code:
      matrix input matname = 1 - mat_correlations  // invalid syntax
      matrix define matname = 1 - mat_correlations  // conformability error
      mdsmat 1 - mat_correlations...  // matrix operators that return matrices not allowed in this context
      What am I missing?

      Thank you so much, again! This is a massive help.
      Dan

      Comment


      • #4
        Hi Dan,

        I am a bit of a Newbie with MDS (as you figured) and appreciate your insight greatly.
        Sure, me too. Let's hope a real expert joins us. In the meantime, suppose I randomly generate a dataset like this:

        Code:
        clear
        set obs 1000
        local numvars = 21
        
        forv i = 1/`numvars'{
            gen var`i' = runiform()
        }
        To take one minus the correlation matrix, you want something like this:

        Code:
        corr var*
        matrix define wanted = J(`numvars', `numvars', 1) - r(C)
        Where the last line above creates an n by n matrix of 1s and subtracts the correlation matrix from that matrix of 1s. Obviously, you can just replace the `numvars' macro with the number of variables in your correlation matrix if you prefer.

        Comment

        Working...
        X