Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • exploratory factor analysis

    how to do factor analysis for 3 dichotomous and 2 ordinal variables?

  • #2
    This FAQ on UCLA's website gives a step-by-step recipe.

    Comment


    • #3
      using this following issues arise.
      . polychoric mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness
      could not calculate numerical derivatives
      missing values encountered
      could not calculate numerical derivatives
      missing values encountered


      Polychoric correlation matrix

      mm_social_connection_PC mm_social_connection_TC aversion_score
      mm_social_connection_PC 1
      mm_social_connection_TC .45738509 1
      aversion_score -.28393653 -.28893077 1
      HC_Awareness -.14912885 .00857793 .

      HC_Awareness
      HC_Awareness 1

      . do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

      . display r(sum_w)
      4131

      .
      end of do-file

      . do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

      . global N = r(sum_w)

      .
      end of do-file

      . do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

      . matrix r = r(R)

      .
      end of do-file

      . do "C:\Users\Zaifi\AppData\Local\Temp\STD434_000000.t mp"

      . factormat r, n($N) factor(1)
      matrix r has missing values
      r(504);


      could you please recommend a way out?

      thanking in anticipation

      Comment


      • #4

        . sum mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness

        Variable Obs Mean Std. dev. Min Max

        mm_social~PC 4,131 .033406 .1339081 0 1
        mm_social~TC 4,131 .0083111 .0741614 0 1
        aversion_s~e 4,131 .4686517 .4990767 0 1
        HC_Awareness 4,131 .5514403 .4974071 0 1

        these are my variables, there is no missing value. further 2 variables are standardized, while 2 variables are binary in nature.

        Comment


        • #5
          What best to do may be a matter of judgment but it certainly depends on your data, which naturally we can't see.

          Please show the results after

          Code:
          contract mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness
          
          dataex
          -- where you should follow FAQ Advice #12 and copy and paste data between delimiters

          Code:
          like this
          The problem is not missing data. It's missing correlations.
          Last edited by Nick Cox; 12 Dec 2024, 04:51.

          Comment


          • #6
            contract mm_social_connection_PC mm_social_connection_TC aversion_score HC_Awareness

            . dataex

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
            0 0        0        0 1666
            0 1        0        0  279
            1 1        0        0 1855
            0 0        0 .3333333    6
            0 1        0 .3333333    6
            1 1        0 .3333333    2
            0 0        0 .6666667    7
            0 1        0 .6666667    7
            1 1        0 .6666667    7
            0 0        0        1    3
            0 1        0        1    1
            0 0 .3333333        0  119
            0 1 .3333333        0   23
            1 1 .3333333        0   42
            0 0 .3333333 .3333333    7
            0 1 .3333333 .3333333    2
            1 1 .3333333 .3333333    1
            1 1 .3333333 .6666667    1
            0 0 .6666667        0   35
            0 1 .6666667        0   13
            1 1 .6666667        0   18
            0 0 .6666667 .3333333    2
            0 0 .6666667 .6666667    2
            1 1 .6666667 .6666667    1
            0 1 .6666667        1    1
            0 0        1        0    6
            0 1        1        0    6
            1 1        1        0    9
            0 1        1        1    4
            end
            ------------------ copy up to and including the previous line ------------------

            Listed 29 out of 29 observations

            .

            Comment


            • #7
              may i use delimiters? kindly elaborate its procedure again, also would it resolve the missing correlation problem?

              Comment


              • #8
                Originally posted by Nafisa Riaz View Post
                could you please recommend a way out?
                A couple of possibilities.
                Code:
                version 18.0
                
                clear *
                
                quietly input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
                <redacted for brevity>
                end
                
                *
                * Begin here
                *
                rename (aversion_score HC_Awareness mm_social_connection_PC ///
                    mm_social_connection_TC) (avs hca scp sct)
                foreach var of varlist sc? {
                    quietly replace `var' = round(3 * `var')
                }
                
                assert _freq > 0
                quietly expand _freq
                drop _freq
                
                // Your problem is here
                tabulate avs hca
                tetrachoric avs hca
                
                // Either exclude one of the collinear indicators . . .
                polychoric avs sc?
                factormat r(R), n(`r(N)') ml nolog
                
                // . . . or resort to a couple of fudge factors in order to get the crank to turn
                tetrachoric avs hca, zeroadjust
                tempname rho
                scalar define `rho' = r(Rho)[2, 2]
                polychoric _all
                local n `r(N)'
                tempname Rho
                matrix define `Rho' = r(R)
                matrix define `Rho'[2, 1] = `rho'
                matrix define `Rho'[1, 2] = `rho'
                factormat `Rho', n(`r(N)') forcepsd ml nolog
                
                exit
                If it were me, I'd go with the former.

                Do-file and log file are attached if you're interested further.
                Attached Files

                Comment


                • #9
                  I wrote:
                  Code:
                  scalar define `rho' = r(Rho)[2, 2]
                  Sorry, took the wrong element there. Try this instead:
                  Code:
                  version 18.0
                  
                  clear *
                  
                  quietly input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
                  <redacted>
                  end
                  
                  *
                  * Begin here
                  *
                  rename (aversion_score HC_Awareness mm_social_connection_PC ///
                      mm_social_connection_TC) (avs hca scp sct)
                  foreach var of varlist sc? {
                      quietly replace `var' = round(3 * `var')
                  }
                  
                  assert _freq > 0
                  quietly expand _freq
                  drop _freq
                  
                  // Your problem is here
                  tabulate avs hca
                  tetrachoric avs hca
                  
                  // Either exclude one of the collinear indicators . . .
                  polychoric avs sc?
                  factormat r(R), n(`r(N)') factors(1)
                  
                  // . . . or resort to a couple of fudge factors in order to get the crank to turn
                  tetrachoric avs hca, zeroadjust
                  tempname rho
                  scalar define `rho' = r(Rho)[2, 1]
                  polychoric _all
                  tempname Rho
                  matrix define `Rho' = r(R)
                  matrix define `Rho'[2, 1] = `rho'
                  matrix define `Rho'[1, 2] = `rho'
                  factormat `Rho', n(`r(N)') forcepsd factors(1)
                  
                  exit
                  Also, the maximum likelihood estimator will still drop one of the two indicator variables as collinear, and so I switched to the default method. (And removed a line of dead code.)
                  Attached Files

                  Comment


                  • #10
                    Thanks for #6. I don't understand #7.

                    I know essentially nothing about tetrachoric and polychoric correlation. Out of curiosity I just pushed the data through a PCA. Here pcacoefsave is from SSC and tabplot is from the Stata Journal. I find correlations between PCs and variables easier to think about than loadings.


                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
                    0 0        0        0 1666
                    0 1        0        0  279
                    1 1        0        0 1855
                    0 0        0 .3333333    6
                    0 1        0 .3333333    6
                    1 1        0 .3333333    2
                    0 0        0 .6666667    7
                    0 1        0 .6666667    7
                    1 1        0 .6666667    7
                    0 0        0        1    3
                    0 1        0        1    1
                    0 0 .3333333        0  119
                    0 1 .3333333        0   23
                    1 1 .3333333        0   42
                    0 0 .3333333 .3333333    7
                    0 1 .3333333 .3333333    2
                    1 1 .3333333 .3333333    1
                    1 1 .3333333 .6666667    1
                    0 0 .6666667        0   35
                    0 1 .6666667        0   13
                    1 1 .6666667        0   18
                    0 0 .6666667 .3333333    2
                    0 0 .6666667 .6666667    2
                    1 1 .6666667 .6666667    1
                    0 1 .6666667        1    1
                    0 0        1        0    6
                    0 1        1        0    6
                    1 1        1        0    9
                    0 1        1        1    4
                    end
                    
                    ds _freq, not 
                    
                    local vars `r(varlist)'
                    
                    corr `vars' [fw=_freq]
                    
                    pca `vars' [fw=_freq]
                    
                    foreach v of local vars { 
                        local lbl : subinstr local v "_" " ", all
                        label var `v' "`lbl'"
                    }
                    
                    pcacoefsave using pca_results, replace 
                    
                    use pca_results, clear 
                    
                    gen sign = sign(corr)
                    
                    gen where = -0.1 
                    
                    tabplot varlabel PC [iw=corr] , subtitle(correlations between variables and PCs) sep(sign) bar1(color(stc2)) bar2(color(stc1)) showval(format(%04.3f)) ytitle("") addplot(scatter where PC, ms(none) mlabel(eigenvalue) mlabformat(%4.3f) mlabsize(medium) mlabcolor(magenta)mlabpos(0) ymla(-0.1 "eigenvalues", labcolor(magenta) tlc(none) labsize(medium)))
                    Click image for larger version

Name:	pcaresulta.png
Views:	1
Size:	53.1 KB
ID:	1769519

                    I'd appreciate any critique of this brute force method. I'm often unclear about the benefits of seeking latent variables as compared with just using the data you have.

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post
                      I wrote:
                      Code:
                      scalar define `rho' = r(Rho)[2, 2]
                      Sorry, took the wrong element there. Try this instead:
                      Code:
                      version 18.0
                      
                      clear *
                      
                      quietly input float(aversion_score HC_Awareness mm_social_connection_PC mm_social_connection_TC) int _freq
                      <redacted>
                      end
                      
                      *
                      * Begin here
                      *
                      rename (aversion_score HC_Awareness mm_social_connection_PC ///
                      mm_social_connection_TC) (avs hca scp sct)
                      foreach var of varlist sc? {
                      quietly replace `var' = round(3 * `var')
                      }
                      
                      assert _freq > 0
                      quietly expand _freq
                      drop _freq
                      
                      // Your problem is here
                      tabulate avs hca
                      tetrachoric avs hca
                      
                      // Either exclude one of the collinear indicators . . .
                      polychoric avs sc?
                      factormat r(R), n(`r(N)') factors(1)
                      
                      // . . . or resort to a couple of fudge factors in order to get the crank to turn
                      tetrachoric avs hca, zeroadjust
                      tempname rho
                      scalar define `rho' = r(Rho)[2, 1]
                      polychoric _all
                      tempname Rho
                      matrix define `Rho' = r(R)
                      matrix define `Rho'[2, 1] = `rho'
                      matrix define `Rho'[1, 2] = `rho'
                      factormat `Rho', n(`r(N)') forcepsd factors(1)
                      
                      exit
                      Also, the maximum likelihood estimator will still drop one of the two indicator variables as collinear, and so I switched to the default method. (And removed a line of dead code.)
                      Joseph can you please guide why code(marked red ) giving the error below:

                      . do "C:\Users\Zaifi\AppData\Local\Temp\STD2a48_000000. tmp"

                      . matrix define `Rho'[2, 1] = `rho'
                      invalid syntax
                      r(198);


                      end of do-file

                      Comment


                      • #12



                        tabplot varlabel PC [iw=corr] , subtitle(correlations between variables and PCs) sep(sign) bar1(color(stc2)) bar2(color(stc1)) showval(format(%04.3f)) ytitle("") addplot(scatter where PC, ms(none) mlabel(eigenvalue) mlabformat(%4.3f) mlabsize(medium) mlabcolor(magenta)mlabpos(0) ymla(-0.1 "eigenvalues", labcolor(magenta) tlc(none) labsize(medium)))
                        [/CODE]

                        [ATTACH=CONFIG]n1769519[/ATTACH]
                        I'd appreciate any critique of this brute force method. I'm often unclear about the benefits of seeking latent variables as compared with just using the data you have. [/QUOTE]

                        @Nick i highly appreciate your insight sir,
                        as name suggest latent variables are not observable so its definitely tricky to extract a universal type of variable, but what else can a researcher do to understand and transform the ground realities into some solid evidence? this is my understanding, i might be wrong.

                        Comment


                        • #13
                          Originally posted by Nafisa Riaz View Post
                          can you please guide why code(marked red ) giving the error below:

                          . do "C:\Users\Zaifi\AppData\Local\Temp\STD2a48_000000. tmp"
                          What you show suggests that you're running an incomplete do-file, neglecting to assign one or both temporary variables.

                          The code does not give an error when the complete do-file is run—see the log file attached there in #9.

                          Comment


                          • #14
                            I do use PCA from time but rarely report its results. I often find that what it shows allows me to approach analysis in different ways, or convinces me that different candidate predictors may all be needed. Mushing variables together often loses information while not affording extra insight.

                            Comment

                            Working...
                            X