Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drawing a four cluster graph using a dataset and "Total" columns

    Dear Stata Community, Hello;

    I hope I could get some help with this one because I've trying to solve my problem, and I didn't get a solution, plus, I don't think it's faisable, I don't think it's possible to get something clear from this data I have.

    So, I do have this data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 influencedpendance byte(f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20) int total
    "F1"    0 0  0  1 0 1  1 0 1 0 1 0  1  0 1 0 1 0 1  0   9
    "F2"    1 0  1  0 0 0  1 1 0 0 1 1  0  1 0 0 1 0 0  1   9
    "F3"    1 0  0  1 1 0  1 1 0 0 1 0  1  1 0 0 1 1 1  0  11
    "F4"    0 1  1  0 0 1  1 0 1 0 0 0  0  1 1 0 0 1 0  0   8
    "F5"    1 1  1  1 0 1  1 1 0 0 1 0  0  1 0 0 1 1 1  0  12
    "F6"    0 0  0  0 0 0  1 1 1 0 0 0  1  1 0 0 0 1 0  1   7
    "F7"    1 1  1  0 1 0  0 0 0 0 1 1  1  0 0 0 0 1 1  0   9
    "F8"    0 0  1  1 1 0  1 0 0 0 1 1  0  0 0 0 1 1 0  0   8
    "F9"    1 0  1  1 0 1  1 0 0 0 0 1  1  1 1 0 0 1 1  1  12
    "F10"   0 0  0  0 0 0  0 0 0 0 0 1  0  0 0 0 0 0 0  0   1
    "F11"   0 0  1  0 0 0  1 0 0 0 0 0  1  1 0 0 0 0 1  0   5
    "F12"   0 1  0  1 0 0  0 0 1 1 0 0  0  0 1 0 0 0 0  1   6
    "F13"   0 1  1  1 0 0  0 0 0 0 1 0  0  1 0 1 1 0 0  1   8
    "F14"   0 0  1  0 0 0  0 0 0 0 1 0  1  0 0 1 0 0 0  1   5
    "F15"   0 0  1  1 0 0  0 0 0 0 0 1  0  1 0 0 1 1 0  1   7
    "F16"   0 1  0  0 0 1  0 0 0 0 0 0  1  1 1 0 1 1 0  1   8
    "F17"   1 1  0  0 0 0  0 1 0 0 0 0  1  1 0 0 0 0 0  1   6
    "F18"   1 0  1  1 0 1  0 1 1 0 0 0  1  1 0 0 0 0 1  1  10
    "F19"   1 0  1  1 0 1  1 0 1 0 0 0  0  0 0 0 1 0 0  0   7
    "F20"   0 0  0  0 0 0  0 0 0 0 0 1  0  1 0 0 0 0 0  0   2
    "TOTAL" 8 7 12 10 3 7 10 6 6 1 8 7 10 13 5 2 9 9 7 10 150
    end
    It's about 20 factors who are influenced by each other, and who are dependant on each other, it's what some experts call "an influence dependance matrix". As you can see, if a factor influences another factor, then it's "1", and if it's not the case, then it's a "0" (so it's binary". At then end, I did the Totals for each factor for the Influence and the Dependance question.

    My goal is the following: I want to draw a four cluster graph using the Total by Influence and the Total bt Dependance for each factor (from F1 to F20) as the coordinates for each point on the graph. I want my x axis to be The Influence, and my y axis to be the Dependance. I've thought to have 4 clusters according to the degree of Total Influence and Total Dependance of each factor, so, for the first cluster, I wanna have the factors who are bad at Influence and bad at Dependance, and so on for the other 3 clusters.

    I hope my explanation is very clear and I wish to get a clear graph for this, because it seemed to me that it is kinda impossible to do so.

    Thanks very much for the help.

  • #2
    Dear Aziz,

    Have you considered using simple correspondence analysis? The internal Stata command camat could be used.
    Following processing your example data in #1, use:
    Code:
    * Setup
    findit matselrc
    * STB-56 dm79.  Yet more matrix commands
    * First install the user community provided package of matrix commands by Nick Cox (to be able to use the command matselrc):
    net install dm79, replace
    h matselrc
    * Then, use the internal Stata command mkmat to create a matrix from your variables:
    mkmat f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20, matrix(C) rowprefix(F) obs
    mat list C
    * But we need to remove the last row holding the sum value of each column using matselrc:
    matselrc C MC, row(1/20) col(1/20)
    mat list MC
    * Next, run:
    camat MC, plot
    Use the command cabiplot to create the biplot with more control of visualization options:
    Code:
    cabiplot , origin xsiz(5) ysiz(5) legend(pos(2) ring(0) col(1))
    graph export "CA_Cluster.png", width(600) height(600) replace
    which results in:
    Click image for larger version

Name:	CA_Cluster.png
Views:	1
Size:	41.3 KB
ID:	1687203

    Furthermore, to inspect the scaling of the categories of the variables we can plot using the Stata command caprojection:
    Code:
    caprojection
    graph export "CA_Cluster_Projection.png", width(960) height(610) replace
    which results in:
    Click image for larger version

Name:	CA_Cluster_Projection.png
Views:	1
Size:	50.8 KB
ID:	1687202

    consult the documentation for further options and examples (ca postestimation plots).

    http://publicationslist.org/eric.melse

    Comment


    • #3
      ericmelse Dear Mr. Melse;

      Thanks for the detailed explanation and the graphs.

      The things is that my goal here is not to use correspondence analysis, the technic doesn't get me what I want.

      First, in your explanation, I see that you've removed the "Total" line and column, yet my goal is to work on that variable already.

      This exercice is basically an "Influence Dependance" exercice (if you're familiar with the notion), my goal is to use the Influence Total and the Dependance Total to draw a graph, so I will get the Factors (F1 to F20) represented on this graph with their Total Influence and Total Dependance as their coordinates, it is basically a clustering technic, and I do wish to get 4 clusters so that I could be able to tell if a chosen Factor has a big Influence or a big Dependance or not. It is basically going to be a scatter plot I guess, yet the graph could be divised into 4 clusters.

      Again, thanks for the previous help Mr. Melse, I really hope that this further explanation of mine was clear.

      Comment


      • #4
        ericmelse Dear Mr. Melse;

        As you can see, the data example I've provided is a double-entry table, and I guess my explanations kinda refer to the Principal Component Analysis (PCA), so I wanna apply that technic on the Total Influence and Total Dependance columns for each Factor

        Comment


        • #5
          I am not especially clear what you seek here, but I tried just treating your data as a 3 x 400 array and shuffling rows and columns indexed by F and f according to their means over the indicator. Here myaxis and tabplot are from the Stata Journal.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str5 influencedpendance byte(f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20) int total
          "F1"    0 0  0  1 0 1  1 0 1 0 1 0  1  0 1 0 1 0 1  0   9
          "F2"    1 0  1  0 0 0  1 1 0 0 1 1  0  1 0 0 1 0 0  1   9
          "F3"    1 0  0  1 1 0  1 1 0 0 1 0  1  1 0 0 1 1 1  0  11
          "F4"    0 1  1  0 0 1  1 0 1 0 0 0  0  1 1 0 0 1 0  0   8
          "F5"    1 1  1  1 0 1  1 1 0 0 1 0  0  1 0 0 1 1 1  0  12
          "F6"    0 0  0  0 0 0  1 1 1 0 0 0  1  1 0 0 0 1 0  1   7
          "F7"    1 1  1  0 1 0  0 0 0 0 1 1  1  0 0 0 0 1 1  0   9
          "F8"    0 0  1  1 1 0  1 0 0 0 1 1  0  0 0 0 1 1 0  0   8
          "F9"    1 0  1  1 0 1  1 0 0 0 0 1  1  1 1 0 0 1 1  1  12
          "F10"   0 0  0  0 0 0  0 0 0 0 0 1  0  0 0 0 0 0 0  0   1
          "F11"   0 0  1  0 0 0  1 0 0 0 0 0  1  1 0 0 0 0 1  0   5
          "F12"   0 1  0  1 0 0  0 0 1 1 0 0  0  0 1 0 0 0 0  1   6
          "F13"   0 1  1  1 0 0  0 0 0 0 1 0  0  1 0 1 1 0 0  1   8
          "F14"   0 0  1  0 0 0  0 0 0 0 1 0  1  0 0 1 0 0 0  1   5
          "F15"   0 0  1  1 0 0  0 0 0 0 0 1  0  1 0 0 1 1 0  1   7
          "F16"   0 1  0  0 0 1  0 0 0 0 0 0  1  1 1 0 1 1 0  1   8
          "F17"   1 1  0  0 0 0  0 1 0 0 0 0  1  1 0 0 0 0 0  1   6
          "F18"   1 0  1  1 0 1  0 1 1 0 0 0  1  1 0 0 0 0 1  1  10
          "F19"   1 0  1  1 0 1  1 0 1 0 0 0  0  0 0 0 1 0 0  0   7
          "F20"   0 0  0  0 0 0  0 0 0 0 0 1  0  1 0 0 0 0 0  0   2
          "TOTAL" 8 7 12 10 3 7 10 6 6 1 8 7 10 13 5 2 9 9 7 10 150
          end
          
          gen F = real(substr(inf, 2, .))
          myaxis newy=F if inf != "TOTAL", sort(mean total)
          drop total 
          drop if inf == "TOTAL"
          reshape long f, i(F) j(x)
          
          myaxis newx=x, sort(mean f)
          tabplot newy newx [w=f], aspect(1) ytitle(, orient(horiz)) scheme(s1color) xtitle(f) subtitle("") note("")

          Click image for larger version

Name:	fftabplot.png
Views:	1
Size:	31.8 KB
ID:	1687302

          Comment


          • #6
            Are you looking for something like this?

            Code:
            rename influencedpendance var
            rename total influence
            gen int dependence = .
            forval i = 1/20 {
                replace dependence = f`i'[21] in `i'
            }
            
            
            sum influence in 1/20, meanonly
            local mean_inf = r(mean)
            sum dependence in 1/20, meanonly
            local mean_dep = r(mean)
            
            scatter dependence influence in 1/20, mlabel(var) xline(`mean_inf') yline(`mean_dep') scheme(s1color)
            which produces:
            Click image for larger version

Name:	Screenshot 2022-10-30 at 4.36.05 PM.png
Views:	1
Size:	624.7 KB
ID:	1687310

            Last edited by Hemanshu Kumar; 30 Oct 2022, 05:07.

            Comment

            Working...
            X