Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster analysis with mixed variables

    Dear all,

    I am approaching cluster analysis in Stata and I would like to start with a simple question. Is it possible to do cluster analysis with categorical data in Stata? If so, what is the approach - e.g. should I treat categorical variables as factors?

    Many thanks in advance for your help.

    Riccardo

  • #2
    Riccardo,

    This does to an extent depend on what sort of cluster analysis you intend to do - hierarchical or k-means/medians. There are several measures that will accommodate binary or mixed data (see http://www.stata.com/manuals13/mvmea...measure_option). You would presumably have to do something like construct a set of dummy codes from the variables to evaluate such variables in a cluster command, however. How well clusters are recovered from multi-categorical data when analyzed as a set of dummy codes is not something there is much research on.

    Another option is a latent class cluster analysis - achievable using the plugin here http://methodology.psu.edu/downloads/lcastata or gllamm (SSC)), where you could treat the data more naturally as a categorical variable.

    Alternatively, the recursive partitioning program chaid (SSC) also produces clustering for categorical variables that you might consider trying.

    - joe
    Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
    ----
    Research Fellow
    Fors Marsh

    ----
    Version 18.0 MP

    Comment


    • #3
      Joe,

      thank you very much. I opted for the Gower distance. Some of my categorical variables have a large number of categories. This makes the calculation quite long but I don't seem to find a more reasonable alternative. LCA looks quite like factor analysis. I will take a look at that to know what it is precisely.

      Many thanks again.
      Riccardo

      Comment


      • #4
        Dear all,
        I'm trying to do
        latent class cluster analysis (exploratory latent class analysis) in Stata for Mac. Unfortunately, the available GLLAMM manuals do not provide information on how to do an exact cluster analysis with this tool and it seems that I won't be able to use the LCAplugin since it only operates for Windows.
        I will appreciate any suggestions to my problem.
        Thanks in advance.
        Basak

        Comment

        Working...
        X