Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choice of similarity measure in binary cluster analysis when co-presence should weigh more than co-absence

    I am doing cluster analysis with a set of binary variables. Given the nature of variables and the research question, co-absence is not as much an indication of similarity as co-presence is. However, co-absence is still an indiciation of similarity (just not as much as co-presence is). For example, one of my variables indicates whether the person has punished for a criminal offense, and 10 percent of the persons in the dataset has been.

    To me (and here I might be wrong) it seems as if there are three types of similarity measures available in Stata:
    1) co-absence is as important as co-presence - e.g. (Zubin 1938, Sokal and Michener 1958) binary similarity coefficient - 'matching'
    2) co-absence is not at all an indication of similarity - Jaccard (1901, 1908) binary similarity coefficient - 'jaccard'
    3) matches - whether they come in the form of co-absence or co-presence - is given more weight than non matches - e.g. Sneath and Sokal (1962) binary similarity coefficient - 'sneath'

    So what I am looking for here is a similarity coefficient which gives more weight to co-presence than co-absence, but still includes co-absence in measuring similarity. Maybe I am getting something fundamentally wrong here, but still, if I do I would much appreciate any help.

  • #2
    I don't have an answer for you, partly because I think cluster analysis is vastly oversold. But I would remark that k binary variables jointly define 2^k classes without any need for similarity measures or agglomeration algorithms. In practice it's likely that many of those 2^k don't occur or are very rare. Alternatively the explosion of 2^k is a bit of a warning against expecting simple structure.

    findname and groups are from the Stata Journal.


    .
    Code:
     webuse nlswork, clear
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . findname, all(inlist(@, 0, 1) | missing(@))
    msp       nev_mar   collgrad  not_smsa  c_city    south     union
    
    
    . groups `r(varlist)', order(high)
    
      +--------------------------------------------------------------------------------+
      | msp   nev_mar   collgrad   not_smsa   c_city   south   union   Freq.   Percent |
      |--------------------------------------------------------------------------------|
      |   1         0          0          0        0       0       0    2024     10.53 |
      |   1         0          0          1        0       1       0    1585      8.24 |
      |   1         0          0          1        0       0       0    1083      5.63 |
      |   1         0          0          0        0       1       0     997      5.19 |
      |   1         0          0          0        1       1       0     997      5.19 |
      |--------------------------------------------------------------------------------|
      |   1         0          0          0        1       0       0     980      5.10 |
      |   0         1          0          0        1       0       0     575      2.99 |
      |   1         0          0          0        0       0       1     548      2.85 |
      |   0         0          0          0        0       0       0     507      2.64 |
      |   0         1          0          0        0       0       0     487      2.53 |
      |--------------------------------------------------------------------------------|
      |   1         0          1          0        0       0       0     470      2.44 |
      |   1         0          0          0        1       0       1     437      2.27 |
      |   0         0          0          0        1       0       0     426      2.22 |
      |   0         1          0          0        1       1       0     417      2.17 |
      |   0         0          0          1        0       1       0     391      2.03 |
      |--------------------------------------------------------------------------------|
      |   0         0          0          0        1       1       0     366      1.90 |
      |   1         0          0          1        0       0       1     320      1.66 |
      |   0         1          0          1        0       1       0     317      1.65 |
      |   1         0          1          0        0       1       0     296      1.54 |
      |   0         1          0          0        1       0       1     287      1.49 |
      |--------------------------------------------------------------------------------|
      |   0         0          0          1        0       0       0     277      1.44 |
      |   1         0          1          0        0       0       1     276      1.44 |
      |   0         0          0          0        0       1       0     264      1.37 |
      |   0         0          0          0        1       0       1     261      1.36 |
      |   1         0          1          0        1       0       0     243      1.26 |
      |--------------------------------------------------------------------------------|
      |   1         0          0          1        0       1       1     221      1.15 |
      |   1         0          1          0        1       1       0     219      1.14 |
      |   1         0          0          0        1       1       1     214      1.11 |
      |   0         1          0          1        0       0       0     203      1.06 |
      |   0         0          0          0        0       0       1     197      1.02 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          0        1       0       0     197      1.02 |
      |   1         0          1          1        0       0       0     193      1.00 |
      |   1         0          1          1        0       1       0     183      0.95 |
      |   1         0          0          0        0       1       1     170      0.88 |
      |   0         1          0          0        0       1       0     163      0.85 |
      |--------------------------------------------------------------------------------|
      |   0         1          0          0        0       0       1     152      0.79 |
      |   0         1          1          0        0       0       0     149      0.78 |
      |   1         0          1          0        1       0       1     141      0.73 |
      |   0         0          0          0        1       1       1     122      0.63 |
      |   1         0          1          1        0       0       1     122      0.63 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          0        1       1       0     110      0.57 |
      |   0         0          1          0        0       0       0     108      0.56 |
      |   0         0          1          0        1       1       0     107      0.56 |
      |   0         1          0          0        1       1       1     107      0.56 |
      |   0         0          1          0        1       0       0      95      0.49 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          0        1       0       1      93      0.48 |
      |   1         0          1          0        1       1       1      82      0.43 |
      |   0         0          0          1        0       1       1      79      0.41 |
      |   1         0          1          0        0       1       1      78      0.41 |
      |   0         0          0          1        0       0       1      77      0.40 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          0        0       0       1      73      0.38 |
      |   0         1          1          1        0       1       0      72      0.37 |
      |   0         0          0          0        0       1       1      67      0.35 |
      |   0         0          1          0        0       1       0      63      0.33 |
      |   0         1          0          1        0       0       1      59      0.31 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          1        0       0       0      50      0.26 |
      |   0         1          0          1        0       1       1      47      0.24 |
      |   0         1          0          0        0       1       1      44      0.23 |
      |   0         1          1          0        0       1       0      44      0.23 |
      |   0         0          1          0        1       0       1      43      0.22 |
      |--------------------------------------------------------------------------------|
      |   0         0          1          1        0       1       0      41      0.21 |
      |   0         1          1          1        0       0       1      38      0.20 |
      |   0         0          1          0        0       0       1      33      0.17 |
      |   1         0          1          1        0       1       1      32      0.17 |
      |   0         0          1          1        0       0       0      20      0.10 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          0        1       1       1      17      0.09 |
      |   0         0          1          0        0       1       1      15      0.08 |
      |   0         0          1          0        1       1       1      14      0.07 |
      |   0         0          1          1        0       0       1      11      0.06 |
      |   0         0          1          1        0       1       1      11      0.06 |
      |--------------------------------------------------------------------------------|
      |   0         1          1          0        0       1       1      11      0.06 |
      |   0         1          1          1        0       1       1       6      0.03 |
      +--------------------------------------------------------------------------------+



    Comment

    Working...
    X