exploratory factor analysis

Nafisa Riaz

Join Date: Dec 2024

Posts: 7
#1

exploratory factor analysis

11 Dec 2024, 23:37

how to do factor analysis for 3 dichotomous and 2 ordinal variables?
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

12 Dec 2024, 00:18

This FAQ on UCLA's website gives a step-by-step recipe.
Comment
Nafisa Riaz

Join Date: Dec 2024

Posts: 7
#3

12 Dec 2024, 02:39

using this following issues arise.
. polychoric mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness
could not calculate numerical derivatives
missing values encountered
could not calculate numerical derivatives
missing values encountered

Polychoric correlation matrix

mm_social_connection_PC mm_social_connection_TC aversion_score
mm_social_connection_PC 1
mm_social_connection_TC .45738509 1
aversion_score -.28393653 -.28893077 1
HC_Awareness -.14912885 .00857793 .

HC_Awareness
HC_Awareness 1

. do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

. display r(sum_w)
4131

.
end of do-file

. do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

. global N = r(sum_w)

.
end of do-file

. do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

. matrix r = r(R)

.
end of do-file

. do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

. factormat r, n($N) factor(1)
matrix r has missing values
r(504);

could you please recommend a way out?

thanking in anticipation
Comment
Nafisa Riaz

Join Date: Dec 2024

Posts: 7
#4

12 Dec 2024, 02:46

. sum mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness

Variable Obs Mean Std. dev. Min Max

mm_social~PC 4,131 .033406 .1339081 0 1
mm_social~TC 4,131 .0083111 .0741614 0 1
aversion_s~e 4,131 .4686517 .4990767 0 1
HC_Awareness 4,131 .5514403 .4974071 0 1

these are my variables, there is no missing value. further 2 variables are standardized, while 2 variables are binary in nature.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

12 Dec 2024, 03:48

What best to do may be a matter of judgment but it certainly depends on your data, which naturally we can't see.

Please show the results after

Code:

contract mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness dataex

-- where you should follow FAQ Advice #12 and copy and paste data between delimiters

Code:

like this

The problem is not missing data. It's missing correlations.

Last edited by Nick Cox; 12 Dec 2024, 03:51.
Comment

Nafisa Riaz

Join Date: Dec 2024
Posts: 7

15 Dec 2024, 22:38

contract mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness

. dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
0 0        0        0 1666
0 1        0        0  279
1 1        0        0 1855
0 0        0 .3333333    6
0 1        0 .3333333    6
1 1        0 .3333333    2
0 0        0 .6666667    7
0 1        0 .6666667    7
1 1        0 .6666667    7
0 0        0        1    3
0 1        0        1    1
0 0 .3333333        0  119
0 1 .3333333        0   23
1 1 .3333333        0   42
0 0 .3333333 .3333333    7
0 1 .3333333 .3333333    2
1 1 .3333333 .3333333    1
1 1 .3333333 .6666667    1
0 0 .6666667        0   35
0 1 .6666667        0   13
1 1 .6666667        0   18
0 0 .6666667 .3333333    2
0 0 .6666667 .6666667    2
1 1 .6666667 .6666667    1
0 1 .6666667        1    1
0 0        1        0    6
0 1        1        0    6
1 1        1        0    9
0 1        1        1    4
end

------------------ copy up to and including the previous line ------------------

Listed 29 out of 29 observations

.

Comment

Nafisa Riaz

Join Date: Dec 2024

Posts: 7
#7

15 Dec 2024, 22:55

may i use delimiters? kindly elaborate its procedure again, also would it resolve the missing correlation problem?
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

16 Dec 2024, 00:07

Originally posted by Nafisa Riaz View Post

could you please recommend a way out?

A couple of possibilities.

Code:

version 18.0

clear *

quietly input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
<redacted for brevity>
end

*
* Begin here
*
rename (aversion_score HC_Awareness mm_social_connection_PC ///
    mm_social_connection_TC) (avs hca scp sct)
foreach var of varlist sc? {
    quietly replace `var' = round(3 * `var')
}

assert _freq > 0
quietly expand _freq
drop _freq

// Your problem is here
tabulate avs hca
tetrachoric avs hca

// Either exclude one of the collinear indicators . . .
polychoric avs sc?
factormat r(R), n(`r(N)') ml nolog

// . . . or resort to a couple of fudge factors in order to get the crank to turn
tetrachoric avs hca, zeroadjust
tempname rho
scalar define `rho' = r(Rho)[2, 2]
polychoric _all
local n `r(N)'
tempname Rho
matrix define `Rho' = r(R)
matrix define `Rho'[2, 1] = `rho'
matrix define `Rho'[1, 2] = `rho'
factormat `Rho', n(`r(N)') forcepsd ml nolog

exit

If it were me, I'd go with the former.

Do-file and log file are attached if you're interested further.

Attached Files

Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

16 Dec 2024, 01:22

I wrote:

Code:

scalar define `rho' = r(Rho)[2, 2]

Sorry, took the wrong element there. Try this instead:

Code:

version 18.0

clear *

quietly input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
<redacted>
end

*
* Begin here
*
rename (aversion_score HC_Awareness mm_social_connection_PC ///
    mm_social_connection_TC) (avs hca scp sct)
foreach var of varlist sc? {
    quietly replace `var' = round(3 * `var')
}

assert _freq > 0
quietly expand _freq
drop _freq

// Your problem is here
tabulate avs hca
tetrachoric avs hca

// Either exclude one of the collinear indicators . . .
polychoric avs sc?
factormat r(R), n(`r(N)') factors(1)

// . . . or resort to a couple of fudge factors in order to get the crank to turn
tetrachoric avs hca, zeroadjust
tempname rho
scalar define `rho' = r(Rho)[2, 1]
polychoric _all
tempname Rho
matrix define `Rho' = r(R)
matrix define `Rho'[2, 1] = `rho'
matrix define `Rho'[1, 2] = `rho'
factormat `Rho', n(`r(N)') forcepsd factors(1)

exit

Also, the maximum likelihood estimator will still drop one of the two indicator variables as collinear, and so I switched to the default method. (And removed a line of dead code.)

Attached Files

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

#10

16 Dec 2024, 03:28

Thanks for #6. I don't understand #7.

I know essentially nothing about tetrachoric and polychoric correlation. Out of curiosity I just pushed the data through a PCA. Here pcacoefsave is from SSC and tabplot is from the Stata Journal. I find correlations between PCs and variables easier to think about than loadings.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
0 0        0        0 1666
0 1        0        0  279
1 1        0        0 1855
0 0        0 .3333333    6
0 1        0 .3333333    6
1 1        0 .3333333    2
0 0        0 .6666667    7
0 1        0 .6666667    7
1 1        0 .6666667    7
0 0        0        1    3
0 1        0        1    1
0 0 .3333333        0  119
0 1 .3333333        0   23
1 1 .3333333        0   42
0 0 .3333333 .3333333    7
0 1 .3333333 .3333333    2
1 1 .3333333 .3333333    1
1 1 .3333333 .6666667    1
0 0 .6666667        0   35
0 1 .6666667        0   13
1 1 .6666667        0   18
0 0 .6666667 .3333333    2
0 0 .6666667 .6666667    2
1 1 .6666667 .6666667    1
0 1 .6666667        1    1
0 0        1        0    6
0 1        1        0    6
1 1        1        0    9
0 1        1        1    4
end

ds _freq, not 

local vars `r(varlist)'

corr `vars' [fw=_freq]

pca `vars' [fw=_freq]

foreach v of local vars { 
    local lbl : subinstr local v "_" " ", all
    label var `v' "`lbl'"
}

pcacoefsave using pca_results, replace 

use pca_results, clear 

gen sign = sign(corr)

gen where = -0.1 

tabplot varlabel PC [iw=corr] , subtitle(correlations between variables and PCs) sep(sign) bar1(color(stc2)) bar2(color(stc1)) showval(format(%04.3f)) ytitle("") addplot(scatter where PC, ms(none) mlabel(eigenvalue) mlabformat(%4.3f) mlabsize(medium) mlabcolor(magenta)mlabpos(0) ymla(-0.1 "eigenvalues", labcolor(magenta) tlc(none) labsize(medium)))

Click image for larger version

Name: pcaresulta.png
Views: 1
Size: 53.1 KB
ID: 1769519

I'd appreciate any critique of this brute force method. I'm often unclear about the benefits of seeking latent variables as compared with just using the data you have.

Comment

Nafisa Riaz

Join Date: Dec 2024
Posts: 7

#11

16 Dec 2024, 22:24

Originally posted by Joseph Coveney View Post

I wrote:

Code:

scalar define `rho' = r(Rho)[2, 2]

Sorry, took the wrong element there. Try this instead:

Code:

version 18.0

clear *

quietly input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
<redacted>
end

*
* Begin here
*
rename (aversion_score HC_Awareness mm_social_connection_PC ///
mm_social_connection_TC) (avs hca scp sct)
foreach var of varlist sc? {
quietly replace `var' = round(3 * `var')
}

assert _freq > 0
quietly expand _freq
drop _freq

// Your problem is here
tabulate avs hca
tetrachoric avs hca

// Either exclude one of the collinear indicators . . .
polychoric avs sc?
factormat r(R), n(`r(N)') factors(1)

// . . . or resort to a couple of fudge factors in order to get the crank to turn
tetrachoric avs hca, zeroadjust
tempname rho
scalar define `rho' = r(Rho)[2, 1]
polychoric _all
tempname Rho
matrix define `Rho' = r(R)
matrix define `Rho'[2, 1] = `rho'
matrix define `Rho'[1, 2] = `rho'
factormat `Rho', n(`r(N)') forcepsd factors(1)

exit

Also, the maximum likelihood estimator will still drop one of the two indicator variables as collinear, and so I switched to the default method. (And removed a line of dead code.)

Joseph can you please guide why code(marked red ) giving the error below:

. do "C:\Users\Zaifi\AppData\Local\Temp\STD2a48_000000. tmp"

. matrix define `Rho'[2, 1] = `rho'
invalid syntax
r(198);

end of do-file

Comment

Nafisa Riaz

Join Date: Dec 2024

Posts: 7
#12

16 Dec 2024, 22:38

tabplot varlabel PC [iw=corr] , subtitle(correlations between variables and PCs) sep(sign) bar1(color(stc2)) bar2(color(stc1)) showval(format(%04.3f)) ytitle("") addplot(scatter where PC, ms(none) mlabel(eigenvalue) mlabformat(%4.3f) mlabsize(medium) mlabcolor(magenta)mlabpos(0) ymla(-0.1 "eigenvalues", labcolor(magenta) tlc(none) labsize(medium)))
[/CODE]

[ATTACH=CONFIG]n1769519[/ATTACH]
I'd appreciate any critique of this brute force method. I'm often unclear about the benefits of seeking latent variables as compared with just using the data you have. [/QUOTE]

@Nick i highly appreciate your insight sir,
as name suggest latent variables are not observable so its definitely tricky to extract a universal type of variable, but what else can a researcher do to understand and transform the ground realities into some solid evidence? this is my understanding, i might be wrong.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#13

16 Dec 2024, 23:32

Originally posted by Nafisa Riaz View Post

can you please guide why code(marked red ) giving the error below:

. do "C:\Users\Zaifi\AppData\Local\Temp\STD2a48_000000. tmp"

What you show suggests that you're running an incomplete do-file, neglecting to assign one or both temporary variables.

The code does not give an error when the complete do-file is run—see the log file attached there in #9.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#14

17 Dec 2024, 01:20

I do use PCA from time but rarely report its results. I often find that what it shows allows me to approach analysis in different ways, or convinces me that different candidate predictors may all be needed. Mushing variables together often loses information while not affording extra insight.
Comment

Announcement