Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Davies-Bouldin Index - Determining the Optimal Number of Cluster in a Cluster Analysis

    Hello,

    I would like to use the Davies-Bouldin Index to determine the optimal number of clusters in a cluster analysis I am doing.

    Does anyone know if there is a command to implement this particular index in STATA? Or does a code exist somewhere that can be applied in STATA?

    I know there’s a command that calculates the Calinski–Harabasz pseudo - ´ F stopping-rule index, and I have already used it. But I’m trying to reproduce the methodology of an article that implements the Davies-Bouldin Index.

    Here are the steps to follow:

    1) Build different models by separating my database into cluster k = 2/15 :

    forvalues i = 2/15 {
    cluster kmeans varlist, k(`i’) name(cluster_`i’)
    }

    * My varlist is composed of 20 variables.

    2) For each model, I would like to calculate and minimize the following validity Index :

    Validity_`i’ = Intra_`i’/Inter_`i’

    Validity is the ratio of the sum of within-cluster scatter to between-cluster separation. Then the objective is to minimize this measure as we want to minimize the within-cluster scatter and maximize the between-cluster separation.

    Another element is that I’m working with survey data. So if it’s possible to take into account weights, it would be ideal !

    Thanks for your help!

    Alexandre Parent
    Employment and Social Development Canada
Working...
X