Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ward cluster analysis after sequence data OM (sqom): Problems generating specified number of groups

    Hello Statalisters,

    I'm using Stata IC 15.1 and have sequence data in the following long format:
    Code:
    ID  episode  element
    1      1       1  
    1      4       2
    1      3       3
    2      7       1
    2      7       2
    2      5       3

    ID identifies people, episode identifies the episode and element is the position in the sequence
    (e.g. the second position is episode 4 for the first person). That is, each row corresponds to an episode
    in a person's sequence.

    I'm using the sq package, installed by ssc install sq.

    First, I run:

    sqset episode ID element

    to designate the dataset as sequence data.

    Now I'm trying to run a Ward cluster analysis in order to group episodes based on a full distance matrix.

    The commands I run are:

    sqom, full

    sqclusterdat

    clustermat wardslinkage SQdist, name(wards) add

    cluster tree wards, cutnumber(20)

    sqclusterdat, return


    Based on the dendrogram, I'm trying to create grouping variables for, say, three clusters. As this fails because of ties, I run:

    cluster gen group3 = gr(3), name(wards) ties(more)

    My understanding is that this should create more than three groups because of ties. However, running

    tab group3

    produces the following output:

    Code:
    ===============================
    group3 Freq.    Percent   Cum.
    -------------------------------    
    1      10,474     82.54   82.54
    3       2,207     17.39   99.94
    4           8      0.06  100.00
    -------------------------------    
    Total  12,689    100.00
    ===============================
    I.e. Stata has created only three groups. But looking at the numbers assigned to them, there's group 1, 3 and 4.

    I don't really understand what's going on here. Has Stata created four groups (as I would expect) but not assigned any episodes to group 2?
    Is it possible for Ward's linkage to produce empty clusters? If so, how would I adequately deal with this situation?

    Running:

    cluster gen group4 = gr(4), name(wards) ties(more)

    tab group4


    produces the exact same results as the command for three groups.

    Running

    cluster gen group5 = gr(5), name(wards) ties(more)

    tab group5


    produces

    Code:
    ================================================
         sq_gr5 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |      4,132       32.56       32.56
              3 |      6,342       49.98       82.54
              5 |      2,207       17.39       99.94
              6 |          8        0.06      100.00
    ------------+-----------------------------------
          Total |     12,689      100.00
    ================================================
    I.e. there's four groups whereas I expected at least six. Does that imply that groups 2 and 4 don't contain any episodes?

    Any assistance is highly appreciated!

    Best regards,
    Bernd
Working...
X