I am trying to determine the optimal number of clusters to use in a kmeans clustering problem. I am clustering on three variables: vstd_schl_cls, nstd_schl_cls, size_std_cls (standardized values of verbal scores, numeric scores, and class sizes, respectively). For -cluster kmeans-, I have used different starting points for k.
A sample of my data is below:
When I try to find the optimal number of clusters via ANOVA for each variable and stored cluster, I get an error: "varlist not allowed" . This is my code:
Why am I getting this error? I have also tried "foreach v of local list2"; "foreach v of local `list2' ; "foreach v of varlist `list2'.
Is my syntax wrong?
Code:
local list2 "vstd_schl_cls nstd_schl_cls size_std_cls" forvalues k = 1(1)20 { cluster kmeans `list2', k(`k') start(random(123)) name(cs`k') mea(abs) keepcen }
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double(vstd_schl_cls nstd_schl_cls size_std_cls) byte cs1 double(cs2 cs3 cs4 cs5) .5896847248077393 .08571480214595795 1.014149785041809 1 2 3 3 4 1.527132511138916 1.0911247730255127 1.014149785041809 1 2 3 4 4 .27720215916633606 1.6656447649002075 1.014149785041809 1 2 2 4 2 .9021673202514648 1.6656447649002075 1.014149785041809 1 1 1 1 1 .9021673202514648 .9474948048591614 1.014149785041809 1 1 2 2 1 end
Code:
matrix WSS = J(20,5,.) matrix colnames WSS = k WSS log(WSS) eta-squared PRE * WSS for each clustering set trace on forvalues k = 1(1)20 { scalar ws`k´ = 0 foreach v of list2 { quietly anova `v´ cs`k´ scalar ws`k´ = ws`k´ + e(rss) } matrix WSS[`k´, 1] = `k´ matrix WSS[`k´, 2] = ws`k´ matrix WSS[`k´, 3] = log(ws`k´) matrix WSS[`k´, 4] = 1 - ws`k´/WSS[1,2] matrix WSS[`k´, 5] = (WSS[`k´-1,2] - ws`k´)/WSS[`k´-1,2] }
Is my syntax wrong?
Comment