Davies-Bouldin Index - Determining the Optimal Number of Cluster in a Cluster Analysis

Alexandre Parent

Join Date: Jul 2015

Posts: 1
#1

Davies-Bouldin Index - Determining the Optimal Number of Cluster in a Cluster Analysis

05 Aug 2015, 08:55

Hello,

I would like to use the Davies-Bouldin Index to determine the optimal number of clusters in a cluster analysis I am doing.

Does anyone know if there is a command to implement this particular index in STATA? Or does a code exist somewhere that can be applied in STATA?

I know there’s a command that calculates the Calinski–Harabasz pseudo - ´ F stopping-rule index, and I have already used it. But I’m trying to reproduce the methodology of an article that implements the Davies-Bouldin Index.

Here are the steps to follow:

1) Build different models by separating my database into cluster k = 2/15 :

forvalues i = 2/15 {
cluster kmeans varlist, k(`i’) name(cluster_`i’)
}

* My varlist is composed of 20 variables.

2) For each model, I would like to calculate and minimize the following validity Index :

Validity_`i’ = Intra_`i’/Inter_`i’

Validity is the ratio of the sum of within-cluster scatter to between-cluster separation. Then the objective is to minimize this measure as we want to minimize the within-cluster scatter and maximize the between-cluster separation.

Another element is that I’m working with survey data. So if it’s possible to take into account weights, it would be ideal !

Thanks for your help!

Alexandre Parent
Employment and Social Development Canada
Tags: None

Announcement

Davies-Bouldin Index - Determining the Optimal Number of Cluster in a Cluster Analysis