Drawing a four cluster graph using a dataset and "Total" columns

Aziz Essouaied

Join Date: Apr 2020

Posts: 182
#1

Drawing a four cluster graph using a dataset and "Total" columns

28 Oct 2022, 14:43

Dear Stata Community, Hello;

I hope I could get some help with this one because I've trying to solve my problem, and I didn't get a solution, plus, I don't think it's faisable, I don't think it's possible to get something clear from this data I have.

So, I do have this data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 influencedpendance byte(f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20) int total "F1" 0 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 9 "F2" 1 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 9 "F3" 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 11 "F4" 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 1 0 0 8 "F5" 1 1 1 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 12 "F6" 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 7 "F7" 1 1 1 0 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 9 "F8" 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 8 "F9" 1 0 1 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 12 "F10" 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 "F11" 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 5 "F12" 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 6 "F13" 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 8 "F14" 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 5 "F15" 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 7 "F16" 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 0 1 8 "F17" 1 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 6 "F18" 1 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 1 1 10 "F19" 1 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 7 "F20" 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 2 "TOTAL" 8 7 12 10 3 7 10 6 6 1 8 7 10 13 5 2 9 9 7 10 150 end

It's about 20 factors who are influenced by each other, and who are dependant on each other, it's what some experts call "an influence dependance matrix". As you can see, if a factor influences another factor, then it's "1", and if it's not the case, then it's a "0" (so it's binary". At then end, I did the Totals for each factor for the Influence and the Dependance question.

My goal is the following: I want to draw a four cluster graph using the Total by Influence and the Total bt Dependance for each factor (from F1 to F20) as the coordinates for each point on the graph. I want my x axis to be The Influence, and my y axis to be the Dependance. I've thought to have 4 clusters according to the degree of Total Influence and Total Dependance of each factor, so, for the first cluster, I wanna have the factors who are bad at Influence and bad at Dependance, and so on for the other 3 clusters.

I hope my explanation is very clear and I wish to get a clear graph for this, because it seemed to me that it is kinda impossible to do so.

Thanks very much for the help.
Tags: None

ericmelse

Join Date: May 2014
Posts: 425

28 Oct 2022, 23:39

Dear Aziz,

Have you considered using simple correspondence analysis? The internal Stata command camat could be used.
Following processing your example data in #1, use:

Code:

* Setup
findit matselrc
* STB-56 dm79.  Yet more matrix commands
* First install the user community provided package of matrix commands by Nick Cox (to be able to use the command matselrc):
net install dm79, replace
h matselrc
* Then, use the internal Stata command mkmat to create a matrix from your variables:
mkmat f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20, matrix(C) rowprefix(F) obs
mat list C
* But we need to remove the last row holding the sum value of each column using matselrc:
matselrc C MC, row(1/20) col(1/20)
mat list MC
* Next, run:
camat MC, plot

Use the command cabiplot to create the biplot with more control of visualization options:

Code:

cabiplot , origin xsiz(5) ysiz(5) legend(pos(2) ring(0) col(1))
graph export "CA_Cluster.png", width(600) height(600) replace

which results in:

Click image for larger version

Name: CA_Cluster.png
Views: 1
Size: 41.3 KB
ID: 1687203

Furthermore, to inspect the scaling of the categories of the variables we can plot using the Stata command caprojection:

Code:

caprojection
graph export "CA_Cluster_Projection.png", width(960) height(610) replace

which results in:

Click image for larger version

Name: CA_Cluster_Projection.png
Views: 1
Size: 50.8 KB
ID: 1687202

consult the documentation for further options and examples (ca postestimation plots).

http://publicationslist.org/eric.melse

Comment

Aziz Essouaied

Join Date: Apr 2020

Posts: 182
#3

29 Oct 2022, 00:47

ericmelse Dear Mr. Melse;

Thanks for the detailed explanation and the graphs.

The things is that my goal here is not to use correspondence analysis, the technic doesn't get me what I want.

First, in your explanation, I see that you've removed the "Total" line and column, yet my goal is to work on that variable already.

This exercice is basically an "Influence Dependance" exercice (if you're familiar with the notion), my goal is to use the Influence Total and the Dependance Total to draw a graph, so I will get the Factors (F1 to F20) represented on this graph with their Total Influence and Total Dependance as their coordinates, it is basically a clustering technic, and I do wish to get 4 clusters so that I could be able to tell if a chosen Factor has a big Influence or a big Dependance or not. It is basically going to be a scatter plot I guess, yet the graph could be divised into 4 clusters.

Again, thanks for the previous help Mr. Melse, I really hope that this further explanation of mine was clear.
Comment
Aziz Essouaied

Join Date: Apr 2020

Posts: 182
#4

29 Oct 2022, 02:25

ericmelse Dear Mr. Melse;

As you can see, the data example I've provided is a double-entry table, and I guess my explanations kinda refer to the Principal Component Analysis (PCA), so I wanna apply that technic on the Total Influence and Total Dependance columns for each Factor
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

30 Oct 2022, 03:51

I am not especially clear what you seek here, but I tried just treating your data as a 3 x 400 array and shuffling rows and columns indexed by F and f according to their means over the indicator. Here myaxis and tabplot are from the Stata Journal.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 influencedpendance byte(f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20) int total
"F1"    0 0  0  1 0 1  1 0 1 0 1 0  1  0 1 0 1 0 1  0   9
"F2"    1 0  1  0 0 0  1 1 0 0 1 1  0  1 0 0 1 0 0  1   9
"F3"    1 0  0  1 1 0  1 1 0 0 1 0  1  1 0 0 1 1 1  0  11
"F4"    0 1  1  0 0 1  1 0 1 0 0 0  0  1 1 0 0 1 0  0   8
"F5"    1 1  1  1 0 1  1 1 0 0 1 0  0  1 0 0 1 1 1  0  12
"F6"    0 0  0  0 0 0  1 1 1 0 0 0  1  1 0 0 0 1 0  1   7
"F7"    1 1  1  0 1 0  0 0 0 0 1 1  1  0 0 0 0 1 1  0   9
"F8"    0 0  1  1 1 0  1 0 0 0 1 1  0  0 0 0 1 1 0  0   8
"F9"    1 0  1  1 0 1  1 0 0 0 0 1  1  1 1 0 0 1 1  1  12
"F10"   0 0  0  0 0 0  0 0 0 0 0 1  0  0 0 0 0 0 0  0   1
"F11"   0 0  1  0 0 0  1 0 0 0 0 0  1  1 0 0 0 0 1  0   5
"F12"   0 1  0  1 0 0  0 0 1 1 0 0  0  0 1 0 0 0 0  1   6
"F13"   0 1  1  1 0 0  0 0 0 0 1 0  0  1 0 1 1 0 0  1   8
"F14"   0 0  1  0 0 0  0 0 0 0 1 0  1  0 0 1 0 0 0  1   5
"F15"   0 0  1  1 0 0  0 0 0 0 0 1  0  1 0 0 1 1 0  1   7
"F16"   0 1  0  0 0 1  0 0 0 0 0 0  1  1 1 0 1 1 0  1   8
"F17"   1 1  0  0 0 0  0 1 0 0 0 0  1  1 0 0 0 0 0  1   6
"F18"   1 0  1  1 0 1  0 1 1 0 0 0  1  1 0 0 0 0 1  1  10
"F19"   1 0  1  1 0 1  1 0 1 0 0 0  0  0 0 0 1 0 0  0   7
"F20"   0 0  0  0 0 0  0 0 0 0 0 1  0  1 0 0 0 0 0  0   2
"TOTAL" 8 7 12 10 3 7 10 6 6 1 8 7 10 13 5 2 9 9 7 10 150
end

gen F = real(substr(inf, 2, .))
myaxis newy=F if inf != "TOTAL", sort(mean total)
drop total 
drop if inf == "TOTAL"
reshape long f, i(F) j(x)

myaxis newx=x, sort(mean f)
tabplot newy newx [w=f], aspect(1) ytitle(, orient(horiz)) scheme(s1color) xtitle(f) subtitle("") note("")

Click image for larger version

Name: fftabplot.png
Views: 1
Size: 31.8 KB
ID: 1687302

Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1320

30 Oct 2022, 04:57

Are you looking for something like this?

Code:

rename influencedpendance var
rename total influence
gen int dependence = .
forval i = 1/20 {
    replace dependence = f`i'[21] in `i'
}


sum influence in 1/20, meanonly
local mean_inf = r(mean)
sum dependence in 1/20, meanonly
local mean_dep = r(mean)

scatter dependence influence in 1/20, mlabel(var) xline(`mean_inf') yline(`mean_dep') scheme(s1color)

which produces:

Click image for larger version

Name: Screenshot 2022-10-30 at 4.36.05 PM.png
Views: 1
Size: 624.7 KB
ID: 1687310

Last edited by Hemanshu Kumar; 30 Oct 2022, 05:07.

Announcement