Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error running a kmeans cluster analysis - error message: factor variables and time-series operators not allowed

    Dear Members,

    I am trying to learn how to perform a cluster analysis. I wish to apply it to the the level of agreement for a set statements, measured via a Likert scale that goes from 1 to 4.

    I have 20 variables each indicating how a certain feature of electric cars is perceived as a barrier to their purchase. These 20 variables can take only the following values: 1, 2, 3, 4. With respect to the proposed statement, 1 indicates that the individual completely disagrees with it, 2 that she partially disagrees, 3 that she partially agrees, and 4 that she totally agrees. They are stored in my database in the following fashion (the image indicates one of the 20 variables).
    Click image for larger version

Name:	data_features.PNG
Views:	1
Size:	2.8 KB
ID:	1520350



    My idea to run a cluster analysis and check if I am able to group individuals into meaningful associations.

    I tried to run the following command

    Code:
    cluster k planning anxiety k(3)
    but I got the following error message:

    factor variables and time-series operators not allowed
    r(101);

    I am stuck at this point and I got no results.

    This is probably trivial issue, and I do apologise if this is the case. Looking into this forum and online I was not able to find a solution to this problem.

    I would be very grateful if any of you could provide me with an insight.

    Marco
    Last edited by Marco Giansoldati; 14 Oct 2019, 10:51.

  • #2
    the "k(3)" is an option and needs to be preceded by a comma

    Comment


    • #3
      I think you should do something like
      Code:
      cluster kmeans planning anxiety, k(3) name(cl1)
      cluster kmedians planning anxiety, k(3) name(cl2)
      HTH

      Comment


      • #4
        If I follow this correctly, then your two variables fall into a 4 x 4 classification. A table of cross-combinations showing up to 16 frequencies -- or any graph showing the same -- is going to keep all the information and is likely to be far more informative than a cluster analysis.

        Comment


        • #5
          Thank you very much Rich Goldstein. My apologies for the mistake. Many thanks FernandoRios for the precious help.

          Comment


          • #6
            Dear Nick Cox, thank you very much for your post.

            I have 20 "barriers" on the purchase of an electric car (from practicality to driving pleasure), which can take values of either 1, 2, 3, or 4. My idea was to run a cluster for example in the following fashion

            Code:
            cluster k practicality-driv_pleas, k(3) name(cluster1)
            and then look at the socio-economic characteristics of the respondents. These encompass, for example, gender (two levels), education (3 levels), occupation (3 levels), self-declared level of expertise with electric cars (2 levels), electric car's driving experience (two levels), etc.

            The levels I put within the brackets stem from the fact I had to group some levels to perfom a Chi-square test and have sufficient numerosity.

            I performed so far a series of cross tabulations, but I am not completely sure if I got your kind suggestion.

            Do you think it would be useful to perform the following command:

            Code:
            tabstat practicality-driv_pleas, by(cluster1)
            ?

            I would actually be interested in performing a sort of graphical analysis of the clusters to better visualize if they carry a message.

            Many thanks Nick.

            Comment

            Working...
            X