Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to read into Stata a variance - covariance mtx (originally computed within Mplus)

    I'm finding that the strengths of Mplus and Stata work synergistically together and compliment each other to compute FA,CFA, ESEM and CSEM analyses based on two split half samples of 351,000 patients and 97 variables.

    I'm using my Stata 13.1 perpetual hex core license.

    Mplus has successfully written out into a covs.txt files variance co-variance matrices based on (1) Bayes or (2) Expectation Maximization estimators, each a 30 hour run. I can study PSR and trace plots for each MCMC chain to asses convergence and stability.

    I have also run these estimators in Stata, but I don't have the ability (available in Mplus) to graphically study MCMC chain mixing and examine possibly problematic estimates of covariances and variances.

    The frustration is that Mplus cannot compute EFA or ESEM analysis with the ML estimator on a variance - covariance matrix; a correlation matrix is required!

    Factormat in Stata will compute a FA with a variance-covariance matrix input (aka covariance matrix input) . However, Factormat requires that my covariance matrix be placed into a Stata conformable matrix for read in.

    An advantage Stata has, is that I can display eigen values, eigen vectors, the determinant, and assess whether the matrix is positive definite. And subsequently combine collinear variables with principal components analysis.

    The Mplus covs.txt file contains 4656 elements with a white space, probably tab, delimiter. It can be seen as a lower triangle matrix with 97 variances in the diagonal and 4559 off diagonal covariances. The file can be read into the Stata editor row - wise starting from left, going to the to right, and then moving down rows starting at the top and ending at the bottom. I've used the import command. I suspect that using the data editor is not necessarily the best way to solve the problem.

    The Stata editor displays the data as six columns all as V1 (variable 1)

    The covariance input can also be seen as a singly dimensioned vector of 4656 elements.

    The question is: How do I read in the .txt file and produce in Stata a lower triangle covariance matrix, say matrix Covs?

    I suspect one solution is to use the mkmat command, but I don't yet see a way to give it the proper arguments to make the matrix I want.

    Factormat allows me to add names to the matrix, once I have built the matrix Covs.

    Also, I have used the _getcovcorr command with other commands to change the covariance matrix into a correlation matrix successfully.

    Thanks to the Statalist team for strategy suggestions and code suggestions; I'll post the final code we arrive at.




  • #2
    There are a few way to go about this, and I have some experience bring results from Mplus into Stata. However, the details strongly depend on precisely what your data look like. So I would ask that you read the FAQ rules and supply us with some usable data example. In this case, I suspect that covs.txt your results file such that each "block of lines" is equivalent to one set of estimates. You will probably need to use -infix- or -infile- and this is the most tedious aspect of data import from Mplus because it writes multi-line observations.

    We don't need the whole file, even a toy example that replicates the key features, say two replicates/estimates from a 3x3 matrix.

    Comment


    • #3
      Thank you Leonardo for your support and question. Well, I'm going to to display a three by three example from a UCLA forum on Stata. This is creation of a square or full matrix of a variance - covariance data,
      in this case, a standardized variance - covariance matrix:

      1 .9 .7
      .9 1 .6
      .7 .6 1


      You will note that the top triangle of the matrix is actually redundant.

      Here would be a lower triangle matrix, that is structurally similar to the august302023bayescovs.txt file I just placed into my Stata directory from an Mplus run covariance matrix output:

      1
      .9 1
      .7 .6 1

      This could correspond to three variable names such as race age marital status. The names for the columns are the same as the names for the rows, in the same order.

      It could be this will not work, and that you have to initially create a full, or square matrix like the top one.

      Here is actual data as I received it in my data editor of the first five rows from Mplus


      . import delimited c:\chest\aug302023bayescovs.txt
      (1 var, 951 obs)


      v1
      0.10144859E+00 -0.66104774E-01 0.24455941E+00 -0.16166649E-02 -0.80306507E-02
      0.13776392E-01 -0.10722875E-01 -0.54454581E-01 -0.13130958E-02 0.85201854E-01
      -0.22969721E-01 -0.11575400E+00 -0.28096750E-02 -0.18678494E-01 0.16027997E+00
      0.14528202E-02 0.12793232E-03 0.23382596E-03 0.67499592E-03 -0.24950242E-02
      0.42814092E-01 -0.14386454E-01 -0.68730341E-02 0.94831028E-04 0.42607063E-03
      (etc on through 4753 numbers)

      ignore the (1 var, 951 obs) It has no meaning for us in this context. Also, ignore v1

      The data is space delimited.

      Now, to compute the number of covariances in this array, we have (N) 97 multiply by (N-1) 96 equals 4656
      We have to add to this the number of variances in the array above which is 97.
      That total number of elements in the data from Mplus is 4753 up above.

      It is in LOWER triangle form (NOT a full matrix) with variances in the diagonal and covariances off diagonal in the lower triangle.

      So to read this in for the purposes of making a matrix, you read the first upper left number, move through the row, then
      read the next row down, left to right, and so forth, to fill a matrix looking like this


      var
      cov var
      cov cov var
      cov cov cov var
      cov cov cov cov var
      etc

      Let me know if this was understandable, and if you need anything else from me.

      Best, Pete


      Comment


      • #4
        Based on what you show, I'm assuming that the data set you are starting from begins like this:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str80 v1
        "0.10144859E+00 -0.66104774E-01 0.24455941E+00 -0.16166649E-02 -0.80306507E-02"
        "0.13776392E-01 -0.10722875E-01 -0.54454581E-01 -0.13130958E-02 0.85201854E-01"
        "-0.22969721E-01 -0.11575400E+00 -0.28096750E-02 -0.18678494E-01 0.16027997E+00"
        "0.14528202E-02 0.12793232E-03 0.23382596E-03 0.67499592E-03 -0.24950242E-02"   
        "0.42814092E-01 -0.14386454E-01 -0.68730341E-02 0.94831028E-04 0.42607063E-03"  
        end
        Then you can create the matrix M you need with:
        Code:
        split v1, gen(var) destring
        drop v1
        
        local r 1
        local c 1
        
        matrix M = I(97)
        
        forvalues o = 1/`=_N' {
            forvalues v = 1/5 {
                matrix M[`r', `c'] = var`v'[`o']
                if `r' != `c' {
                    matrix M[`c', `r'] = M[`r', `c']
                }
                local ++c
                if `c' > `r' {
                    local c = 1
                    local ++r
                }
            }
        }

        Comment


        • #5
          Here's an alternative to what Clyde has shown. I only used the first three rows to show how to turn that in a 5x5 matrix, but this extended directly to whatever size of matrix you have. You'll end up with a symmetric Mata or Stata matrix and then you can use that however you wish.

          Code:
          clear *
          cls
          
          cd "c:\tmp\cov"
          
          import delimited covs.txt, delim(" ") clear case(lower)
          list
          
          * number each set of observations and row number within each observation to preserve the structure
          * of the imported data.
          gen `c(obs_t)' row = _n
          order row, first
          sort row
          
          * flatten the data so that all values are recorded into a single observation.
          * renumber those variables to be sequentially ordered.
          gen byte set = 1   // nuissance variable used only for reshape to work
          reshape wide v* , i(set) j(row)
          drop set
          rename (v#) (v#), renumber
          
          mata:
            // read the data into Mata
            X = st_data(., "v*", .)
           
            // stripe the vector to a symmetric matrix
            V = J(5,5,.)
            counter = 0
            for (j=1; j<=5; j++) { // loop over columns
              for (i=1; i<=5; i++) { // loop over rows
                if (i<=j) {
                  counter++
                  V[i, j] = X[counter]
                  V[j, i] = V[i, j]
                }
              }
            }
            V
           
            // push the matrix to Stata
            st_matrix("V", V)
          end
          
          matlist V
          Selected output:

          Code:
          :   V
          [symmetric]
                            1              2              3              4              5
              +----------------------------------------------------------------------------+
            1 |     .10144859                                                              |
            2 |   -.066104774      .24455941                                               |
            3 |  -.0016166649   -.0080306507     .013776392                                |
            4 |   -.010722875    -.054454581   -.0013130958     .085201854                 |
            5 |   -.022969721       -.115754    -.002809675    -.018678494      .16027997  |
              +----------------------------------------------------------------------------+
          
          . matlist V
          
                       |        c1         c2         c3         c4         c5
          -------------+------------------------------------------------------
                    r1 |  .1014486                                             
                    r2 | -.0661048   .2445594                                  
                    r3 | -.0016167  -.0080307   .0137764                       
                    r4 | -.0107229  -.0544546  -.0013131   .0852019            
                    r5 | -.0229697   -.115754  -.0028097  -.0186785     .16028

          Comment


          • #6
            Leonardo and Clyde, thanks for your fine work. Truly outstanding efforts from Leonardo and Clyde!

            I'm going to test both versions over this Labor Day weekend. Give me through Tuesday Sept 4, 2023 to get back to you all about the results of my tests.

            I have a Medtronics C++ Senior Programmer here to help me on debugging, and will post about any issues we run into over the next several days. Please standby if possible.

            I need a couple of immediate answers from you before I start testing.

            1) I assume I paste the code into a do file and execute from a do file (not the command line). Is that correct?

            2) Clyde, I'm not familiar with datasex, and how to set up the input file to work with your code.
            Leonardo's code shows how I can use the import command (which I have used) to get a hold of my data file that his code addresses.
            At my limited level of ability, that works better for me at this time. Leonardo, thank you for showing us how to tell Stata to use a white space delimiter.


            3) In Leonardo's code, am I correct to just substitute 97 for the numbers 5 I see, to get the code to work with all of my 97 variables, in my monstrous input file?

            4) Likewise Clyde, should I substitute 97 for the 5 in your code?

            5) Clyde, am I correct that you have already initialized matrix M to contain 97 columns and 97 rows by matrix M = I(97) ?
            For others who are following, I'd like to make a few points. Mplus limits estimators when reading in summary data such as a covariance matrix or a correlation matrix to GLS ML ULS (generalized least squares, Max likelihood, unweighted least squares). Factormat in Stata allows iterated principal factors, not available in Mplus. Initial tests show that ipf in Stata and ML in Mplus produce virtually identical factor loadings, a good outcome. In medical research, where patients' health is involved, it is good to get congruence from two different statistical packages, such as Stata and Mplus. Mplus will only write out a covariance matrix after Bayes or Expectation Maximization. Yet Mplus requires a correlation matrix for summary data, and will not read in a covariance matrix, a limitation. _getcovcorr function in stata is required to convert the Mplus covariance matrix back into a correlation matrix, for Mplus read in to construct path models. _getcovcorr in Stata can also place your matrix into either a lower, upper triangle, or a full, symmetric Hermitian matrix. After developing a SEM or path model from a corr or covariance mtx input, it is best to confirm the model with raw data read to your SEM stat package.

            Comment


            • #7
              Originally posted by Peter Hoon View Post
              I'm going to test both versions over this Labor Day weekend. Give me through Tuesday Sept 4, 2023 to get back to you all about the results of my tests.

              I have a Medtronics C++ Senior Programmer here to help me on debugging, and will post about any issues we run into over the next several days. Please standby if possible.

              I need a couple of immediate answers from you before I start testing.
              Please understand that this is a forum made up of Stata users who freely volunteer their time to help out and post when they can. We are not paid staff members of Stata Corp. What you wrote can be perceived as an imposition on both Clyde's and my time. While I do not believe that is how the message was intended, others may take it as such, and your message may be perceived as rude or demanding. For myself, I help when and where I can. Many of us are based in the US and Canada, and we are coming up to a long weekend. I will not be much interested to check this forum for urgent issues, so please understand that future help may be slow to arrive, if it comes at all.

              Originally posted by Peter Hoon View Post
              3) In Leonardo's code, am I correct to just substitute 97 for the numbers 5 I see, to get the code to work with all of my 97 variables, in my monstrous input file?
              Yes, that's right. It should work fine with that substitution. In previous work, my coefficients were more limited in number and I was interested in Monte Carlo simulations so I opted for a different method than what I demonstrated above that used -infile- or -infix- which allows for a more direct input from a CSV file to the wide layout that I've showed above. The downside however is the creation of the dictionary file takes much more typing and some time to set up.

              Comment


              • #8
                1) I assume I paste the code into a do file and execute from a do file (not the command line). Is that correct?
                Correct.

                2) Clyde, I'm not familiar with datasex, and how to set up the input file to work with your code.
                Leonardo's code shows how I can use the import command (which I have used) to get a hold of my data file that his code addresses.
                At my limited level of ability, that works better for me at this time. Leonardo, thank you for showing us how to tell Stata to use a white space delimiter.
                Do familiarize yourself with -dataex-. If you are using Stata version 14.2 or later, it is part of your official Stata. If still using an earlier one get it from SSC. Read -help dataex- to learn how to use it--it's very simple. It is the best way to show what a Stata data set looks like.
                In writing my post at #4, I first created a Stata data set that matched, as I understood it, the data example you showed in the middle of #3. Then I used -dataex- to turn it into something that could be posted and used. If I did it correctly, you already have this data set: it is what you got from importing the MPlus output into Stata. And while I agree with Leonardo that changing your -import- command to get a more convenient result in Stata is a good idea, I didn't feel confident enough about the details of the MPlus output to propose a specific command. And since you already had a Stata import of the MPlus output, I figured we should work from there. In short, just start with the import file you already have and apply the rest of my code.

                4) Likewise Clyde, should I substitute 97 for the 5 in your code?
                No! In Leonardo's code he is "targeting" a 5x5 matrix for illustration. My code targets a 97x97 matrix. The 5 in my code reflects the fact that each line of the Stata-imported MPlus output you showed in #3 contains 5 numbers. So don't change that 5 unless the actual Stata file you imported contains some other number of numbers in its observations.[/quote]

                5) Clyde, am I correct that you have already initialized matrix M to contain 97 columns and 97 rows by matrix M = I(97) ?
                Yes.
                Last edited by Clyde Schechter; 31 Aug 2023, 15:26.

                Comment


                • #9
                  Gentlemen, Clyde and Leonardo, I'm very sorry about my comment. I extend my appologies. I am very excited about getting this project toward completion, and did not think ahead to the consequences of my standby comment.
                  Thanks for your comments.
                  Everyone have a great week, I'll get to work, and we'll talk again in the future at your convenience.

                  Comment


                  • #10
                    Clyde and Leonardo: Here is the run output showing a names problem I haven't solved. Running
                    Clyde's code. Now using 90 variables.
                    Please realize that stata displays a lower triangle, but in fact, stores the matrix as a full symmetric mtx.

                    Suggestions appreciated:


                    1. (/v# option or -set maxvar-) 5000 maximum variables

                    . do "C:\chest\read Mplus var - cov data to form Stata mtx.do"

                    . /*
                    > Clyde Schechter's code for reading in a var - cov mtx written out
                    > by Mplus as a .txt file. Mplus's .txt file is all one vector, rowise, from left
                    > white space delimited, representing
                    > a lower triangle variance - covariance matrix. The code below
                    > gets the .txt file into a matrix that can be read into
                    > Stata factor analysis program factormat. 9/7/2023.
                    > The Matrix M looks like this:
                    > var
                    > cov var
                    > cov cov var
                    > cov cov cov var etc. Though Stata displays a lower triangle, the mtx M
                    is in fact stored as a full symmetric matrix.
                    > */
                    . clear

                    . cd c:\chest
                    c:\chest

                    . set more off

                    . import delimited c:\Mplus\sep72023standbayescovs.txt
                    (1 var, 819 obs)

                    . /*Clyde's code follows*/
                    . split v1, gen(var) destring
                    variables born as string:
                    var1 var2 var3 var4 var5
                    var1 has all characters numeric; replaced as double
                    var2 has all characters numeric; replaced as double
                    var3 has all characters numeric; replaced as double
                    var4 has all characters numeric; replaced as double
                    var5 has all characters numeric; replaced as double

                    . drop v1

                    .
                    . local r 1

                    . local c 1

                    .
                    . matrix M = I(90) /*Set this value to the num of variables you have*/

                    .
                    . forvalues o = 1/`=_N' {
                    2. forvalues v = 1/5 {
                    3. matrix M[`r', `c'] = var`v'[`o']
                    4. if `r' != `c' {
                    5. matrix M[`c', `r'] = M[`r', `c']
                    6. }
                    7. local ++c
                    8. if `c' > `r' {
                    9. local c = 1
                    10. local ++r
                    11. }
                    12. }
                    13. }

                    .
                    . matrix list M/*list your matrix to confirm accuracy of import and read in*/

                    symmetric M[90,90]
                    c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
                    r1 1.0009118
                    r2 -.42002171 1.0014574
                    r3 -.04280465 -.13911006 1.0005674
                    r4 -.11558969 -.37713748 -.03835486 1.0006192
                    r5 -.18028732 -.58547007 -.05956859 -.16077259 1.0018269
                    r6 .02240978 .00025084 .00948534 .01138361 -.02927459 1.0006201
                    r7 -.04524086 -.01322443 .00099151 .0012906 .05115434 -.51524085 1.0007675
                    r8 .13881011 .22103494 -.07552929 .09155974 -.42923755 .09309907 -.17002208 1.0005052

                    and so forth for 4005 (90*89/2) elements

                    Here is the Stata error message which to me says that Stata needs a way to get 90 names in place for each column
                    and corresponding row, which I have not succeeded in setting up properly. Factormat below has been tested on on a previous
                    matrix with names. I tried to give the matrix names in factormat, v1-v90, but it did not work:

                    timer clear

                    . timer on 1

                    .
                    . factormat M, n(7786) shape(full) names(v1-v90) citerate(10) ipf factors(8)

                    name conflict: row and column names of M should match
                    r(198);

                    end of do-file

                    r(198);

                    Comment


                    • #11
                      Code:
                      // BUILD A LIST OF NAMES TO APPLY TO BOTH ROWS AND COLUMNS
                      local names
                      forvalues i = 1/90 { // OR WHATEVER THE DIMENSION OF THE MATRIX IS
                          local names `names' var`i'
                      }
                      
                      // RENAME THE ROWS AND COLULMNS
                      matrix rownames M = `names'
                      matrix colnames M = `names'
                      Now M will have the same names in the rows as columns.

                      Another way of doing it that might be more useful is, instead of building a list of names that reads var1 var2 ... var90, I presume given their provenance that these are actual variables that you used in your MPlus analysis. Nothing in this thread hints at what those variables' names are. But if you can scrape up a list of the real variable names, use that instead--it'll probably make the factormat output more useful to you that way.

                      Comment


                      • #12
                        Clyde and Leonardo, thank you for your help. Clyde, with the recent code that builds names into matrix M, found
                        that I could use factormat (and other Stata routines), to substitute in my own names. Here is a successful
                        factormat run (with documentation to help others) based on Clyde's code. Leonardo,
                        don't have time to test your code, but I'm sure it would also lead to success.
                        Unfortunately, Statalist doesn't give me enough space to post this lengthy run. Factormat ran fast and successfully.
                        Best from Paul aka "Peter".

                        Comment


                        • #13
                          Clyde, Leonardo, and multivariate researchers in the biology, sociology, econometrics, medicine, psychology, I am
                          posting code (a do file) I'm using daily to read in a covariance matrix from Mplus (in this case, Bayes estimation). I hope this
                          code will assist others in their work. If you have suggestions for improving documentation, below,
                          please let me know. Best to all from Pete.



                          /*
                          Clyde Schechter's code for reading in a var - cov mtx written out
                          by Mplus as a .txt file. Mplus's .txt file is all one
                          single dimensioned vector,
                          white space delimited, representing a
                          a lower triangle variance - covariance matrix. Most likely,
                          SAS, SPSS R, AMOS also write out variance covariance matrices.
                          The code below
                          gets the .txt file that Mplus outputs into a matrix that can be read into
                          Stata factor analysis program factormat, 9/13/2023.
                          Stata's factormat program starts with "summary data",
                          a variance - covariance matrix, which several authors
                          (Kline, Bollen and Bentler) recommend
                          (in lieu of a correlation matrix) for SEM and factor analysis work.
                          The Matrix M looks like this as it is read into Stata:
                          var
                          cov var
                          cov cov var
                          cov cov cov var etc.

                          Stata factormat with quartimax rotation runs in just a few seconds,
                          a great advantage in gaining a rpid understanding of latent constructs.
                          Stata _getcovcorr will alter the covariance matrix back into a correlation matrix,
                          that Mplus (and others) can read and use to build ESEM or CSEM models. Eventhough,
                          a covariance matrix is recommended, Mplus strangely often requires
                          a correlation matrix for ESEM, etc.

                          An important point to remember, that is often confusing
                          in Stata:
                          The lower triangle matrix is read into Stata correctly.
                          Yet the matrix is stored as a full Hermitian symmetric matrix by
                          Stata. The Stata command
                          matrix list M
                          DISPLAYS a lower triangle
                          matrix! You are led to believe it is stored as such, but it is not.
                          Again, it is stored as full symmetric Hermitian.

                          A covariance, SEM or FA model on a hex core machine
                          running at 3.6 GHz,
                          turbo boosted,
                          can take up to 60 hours in Mplus, SAS, SPSS or Stata from raw data,
                          when observations number seven hundred thousand. When summary
                          data is used as input, FA, SEM, CFA, CSEM in Stata
                          may take only A FEW SECONDS from a matrix of
                          dimensioned at 90! The importance/advantage of having the capability
                          of reading in a var - cov mtx from SAS, Mplus, Stata, SPSS, AMOS shown
                          in the output below, becomes
                          clear for multivariate researchers who have big data:
                          */

                          cd c:\chest
                          set more off
                          clear
                          // import delimited c:\Mplus\sep92023standiwbayescovs.txt
                          import delimited c:\Mplus\sep82023standbayescovs.txt
                          // import delimited c:\Mplus\sep102023unstandiwbayescovs.txt
                          /*import delimited c:\Mplus\sep112023unstandiwfixmeansbayescovs.txt*/
                          // import delimited c:\Mplus\sep132023standiw60bayescovs.txt
                          // Clyde's code follows
                          split v1, gen(var) destring
                          drop v1

                          local r 1
                          local c 1

                          matrix M = I(90) /*Set this value to the num of variables you have*/

                          forvalues o = 1/`=_N' {
                          forvalues v = 1/5 {
                          matrix M[`r', `c'] = var`v'[`o']
                          if `r' != `c' {
                          matrix M[`c', `r'] = M[`r', `c']
                          }
                          local ++c
                          if `c' > `r' {
                          local c = 1
                          local ++r
                          }
                          }
                          }




                          // BUILD A LIST OF NAMES TO APPLY TO BOTH ROWS AND COLUMNS by Clyde Schechter
                          local names
                          forvalues i = 1/90 { // OR WHATEVER THE DIMENSION OF THE MATRIX IS
                          local names `names' var`i'
                          }

                          // RENAME THE ROWS AND COLULMNS Clyde Schechter's code
                          matrix rownames M = `names'
                          matrix colnames M = `names'

                          matrix list M/*list your mtrix to confirm accuracy of import and read in.
                          As Clyde points out, the number of variables in your data set
                          is equal to the dimension of the matrix*/

                          matrix symeigen X v = M
                          matrix list v/*list eigenvalues to spot a non positive definite
                          variance - covariance mtx. In this case last eigenvalue is neg*/


                          timer clear
                          timer on 1


                          /*
                          Test run of factor analysis using factormat of matrix M with names.
                          Note: first letter of matrix must be capitalized

                          WARNING: the MATRIX IN THIS TEST IS NOT POSITIVE DEFINITE, and so researcher
                          Pete must get back to work in
                          Mplus to create a positive definite matrix! Stata allows, however,
                          the forcepad option, to complete a test FA run. Virtually no other stat
                          package such as SPSS, MPLUS, SAS, (and probably R) has such a capability.
                          It is unknown whether the 15 factor test solution below will be similar to,
                          or quite different from, a matrix that has been corrected to positive definite.
                          Usually, when the last egenvalue is slightly negative, you will get identical
                          solutions.*/

                          factormat M, n(7786) shape(full) citerate(15) ipf factors(15) forcepsd ///
                          names(SINGLE MARRIED SEPARATE ///
                          DIVORCED WIDOWED HISORG IRIBE ///
                          YR_BIRTH SEQ_NUM MDXRCMP ORGRISK FORORG ///
                          CENTRAL UPPERIN LOWERIN UPPEROUT LOWEROUT AXILTAIL ///
                          OVERLAP LATERAL GRADE EOD10_SZ ///
                          EOD10_EX EOD10_NE EOD10SRG LYMPHMIS CSTUMSIZ CSEXTEN ///
                          LYMPOD10 CS5SITE CS6SITE DAJCCT DAJCCN DAJCCSTG ///
                          SURGPRIF SURGSITF SRGDTMIS NUMNODES NO_SURG RADIATN ///
                          RADMIS RAD_SURG SS_SURG MASTMIS SURGSITE ADISMIS ///
                          ICD9V10V UNSPPSM EPITHPSM SQUAMPSM ADENCAR ///
                          CYTMUSER DUCTLOB HST_STGA A3SEERSG FIRSTPRM CTYMEDN ///
                          CTYPOV CTYPOV18 POVBIRTH CTYINCID HSEDSTAT COLEDUST ///
                          DXCTYCOL DXCTYHS DXSTRISK CODPUB STAT_REC SUMM2K ///
                          DETHCLSS CSTSEVAL CSRGEVAL CSMETVAL INTPRIM ERSTATUS ///
                          PRSTATUS SRVTIMON INSRECPB ADJTM6VL CSMETDX CS7SITE ///
                          HER2 BRST_SUB METBONPB METBRPB METLVPUB METLGPUB ///
                          T_VALUE ED10NDPN M_VALUE)


                          // FACTORMAT OPTIONS TO EXPERIMENT WITH
                          // about 5 iters keeps uniquenesses positive
                          // factormat M, n(7786) shape(full) ipf factors(25}*/

                          /*
                          citerate(25) ipf only
                          ipf iterated principal factors
                          pf principal factors
                          ml maximum liklihood
                          pcf principal component
                          */

                          /*forcepsd*/

                          timer off 1
                          timer list 1


                          /*ipf citerate(2) factors(18)*/
                          /*Or factor (varlist), pf factors(10)*/



                          timer clear
                          timer on 1


                          /*test of quartimax rotation for 10 factors*/
                          rotate, quartimax norm factors(15) blanks(.23)

                          /*rotate, oblimin factors(10)*/

                          timer off 1
                          timer list 1

                          Comment


                          • #14
                            Clyde and Leonardo:

                            As of Nov 25 2024, I am now working with a new subset of variables that numbers 87 (instead of the 90 variables in my last post above).

                            The code Clyde developed only works if the rank of the matrix is evenly divisible by 5. That is because Stata displays (and may also store in memory) each row containing five columns of data. In my new case of 87 variables, the last row of data to be read in only contains three columns with space delimited numbers, and Clyde's code is expecting that all five columns have numbers. The code fails.

                            Would it be possible to alter the code below to read in data when the last row of data has less than five numbers?

                            I have installed dataex, and have tried to get the following code into dataex for Clyde and other Stata list members.
                            My apologies; I have not succeeded. Will keep trying.

                            Here is the code that hopefully could be modified to work with 87 variables. Call on me to supply more information if needed to assist you.

                            // change directory if needed
                            cd c:\chest

                            set more off
                            clear

                            // Use of Stata's import command
                            import delimited c:\Mplus\sep82023standbayescovs.txt

                            split v1, gen(var) destring
                            drop v1

                            local r 1
                            local c 1

                            // Set this value to the number of variables you have.
                            // Clyde points out that this number is also = rank of mtx.
                            matrix M = I(90)

                            // Clyde Schectner's code:
                            forvalues o = 1/`=_N' {
                            forvalues v = 1/5 {
                            matrix M[`r', `c'] = var`v'[`o']
                            if `r' != `c' {
                            matrix M[`c', `r'] = M[`r', `c']
                            }
                            local ++c
                            if `c' > `r' {
                            local c = 1
                            local ++r
                            }
                            }
                            }




                            // BUILD A LIST OF NAMES TO APPLY TO BOTH ROWS AND COLUMNS by Clyde Schechter
                            local names
                            forvalues i = 1/90 { // OR WHATEVER THE DIMENSION OF THE MATRIX IS
                            local names `names' var`i'
                            }

                            // RENAME THE ROWS AND COLULMNS Clyde Schechter's code
                            matrix rownames M = `names'
                            matrix colnames M = `names'

                            // list your mtrix to confirm accuracy of import and read in.
                            // As Clyde points out, the number of variables in your data set
                            // is equal to the dimension of the matrix*/
                            matrix list M

                            // list eigenvalues to spot a non positive definite
                            // variance - covariance mtx. One or more could be negative
                            // If you have neg eigenvalues, you may need to combine
                            // collinear variables by principal components analysis.
                            matrix symeigen X v = M
                            matrix list v






                            Comment


                            • #15
                              Without sample input, I'm not sure what I'm asked to deal with here. Is it still the case that the MPlus input comes in 5 columns but is read as a triangular matrix like
                              var
                              cov var
                              cov cov var
                              ...
                              etc.

                              So I've written some code that creates a toy data set (with random numbers for the covariances) that will look like the input of the MPlus output after you have -import-ed it to Stata and renamed the variables. You already have the real MPlus output, so you will need only the code from // CREATE THE MATRIX on down. To develop and test, I used 17 "variables" rather than 87, but it should actually work for any number of variables, at least up to the point where memory is exhausted.
                              Code:
                              //    CREATE A TOY DATA SET THAT LOOKS LIKE AN MPLUS COVARIANCE MATRIX
                              //    WITH 87 VARIABLES, READ SNAKEWISE, AND LAID OUT IN 5 COLUMNS
                              clear*
                              set seed 1234
                              
                              set obs 1
                              local ncols 5
                              local nvars 17
                              
                              local varnum 1
                              local obs_no 1
                              local col_no 1
                              
                              forvalues i = 1/`ncols' {
                                  gen var`i' = .
                              }
                              
                              local max_count = (`nvars'+1)*(`nvars')/2
                              local counter 1
                              
                              while `counter' <= `max_count' {
                                  replace var`col_no' = runiform() in `obs_no'
                                  if `col_no' < `ncols' {
                                      local ++col_no
                                  }
                                  else {
                                      local col_no 1
                                      insobs 1, after(_N)
                                      local ++ obs_no
                                      
                                  }
                                  local ++counter
                                  
                              }
                              
                              
                              
                              //    CREATE THE MATRIX
                              local nvars 17
                              local r 1
                              local c 1
                              
                              
                              matrix M = I(`nvars')
                              
                              forvalues o = 1/`=_N' {
                                  forvalues v = 1/`ncols' {
                                      matrix M[`r', `c'] = var`v'[`o']
                                      if `r' != `c' {
                                          matrix M[`c', `r'] = M[`r', `c']
                                      }
                                      local ++c
                                      if `c' > `r' {
                                          local c = 1
                                          if `r' < `nvars' {
                                              local ++r
                                          }
                                          else {
                                              continue, break
                                          }
                                      }
                                  }
                              }

                              Comment

                              Working...
                              X