How to read into Stata a variance - covariance mtx (originally computed within Mplus)

Peter Hoon

Join Date: Dec 2016

Posts: 13
#1

How to read into Stata a variance - covariance mtx (originally computed within Mplus)

30 Aug 2023, 12:37

I'm finding that the strengths of Mplus and Stata work synergistically together and compliment each other to compute FA,CFA, ESEM and CSEM analyses based on two split half samples of 351,000 patients and 97 variables.

I'm using my Stata 13.1 perpetual hex core license.

Mplus has successfully written out into a covs.txt files variance co-variance matrices based on (1) Bayes or (2) Expectation Maximization estimators, each a 30 hour run. I can study PSR and trace plots for each MCMC chain to asses convergence and stability.

I have also run these estimators in Stata, but I don't have the ability (available in Mplus) to graphically study MCMC chain mixing and examine possibly problematic estimates of covariances and variances.

The frustration is that Mplus cannot compute EFA or ESEM analysis with the ML estimator on a variance - covariance matrix; a correlation matrix is required!

Factormat in Stata will compute a FA with a variance-covariance matrix input (aka covariance matrix input) . However, Factormat requires that my covariance matrix be placed into a Stata conformable matrix for read in.

An advantage Stata has, is that I can display eigen values, eigen vectors, the determinant, and assess whether the matrix is positive definite. And subsequently combine collinear variables with principal components analysis.

The Mplus covs.txt file contains 4656 elements with a white space, probably tab, delimiter. It can be seen as a lower triangle matrix with 97 variances in the diagonal and 4559 off diagonal covariances. The file can be read into the Stata editor row - wise starting from left, going to the to right, and then moving down rows starting at the top and ending at the bottom. I've used the import command. I suspect that using the data editor is not necessarily the best way to solve the problem.

The Stata editor displays the data as six columns all as V1 (variable 1)

The covariance input can also be seen as a singly dimensioned vector of 4656 elements.

The question is: How do I read in the .txt file and produce in Stata a lower triangle covariance matrix, say matrix Covs?

I suspect one solution is to use the mkmat command, but I don't yet see a way to give it the proper arguments to make the matrix I want.

Factormat allows me to add names to the matrix, once I have built the matrix Covs.

Also, I have used the _getcovcorr command with other commands to change the covariance matrix into a correlation matrix successfully.

Thanks to the Statalist team for strategy suggestions and code suggestions; I'll post the final code we arrive at.
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2407
#2

30 Aug 2023, 12:50

There are a few way to go about this, and I have some experience bring results from Mplus into Stata. However, the details strongly depend on precisely what your data look like. So I would ask that you read the FAQ rules and supply us with some usable data example. In this case, I suspect that covs.txt your results file such that each "block of lines" is equivalent to one set of estimates. You will probably need to use -infix- or -infile- and this is the most tedious aspect of data import from Mplus because it writes multi-line observations.

We don't need the whole file, even a toy example that replicates the key features, say two replicates/estimates from a 3x3 matrix.
Comment
Peter Hoon

Join Date: Dec 2016

Posts: 13
#3

30 Aug 2023, 17:00

Thank you Leonardo for your support and question. Well, I'm going to to display a three by three example from a UCLA forum on Stata. This is creation of a square or full matrix of a variance - covariance data,
in this case, a standardized variance - covariance matrix:

1 .9 .7
.9 1 .6
.7 .6 1

You will note that the top triangle of the matrix is actually redundant.

Here would be a lower triangle matrix, that is structurally similar to the august302023bayescovs.txt file I just placed into my Stata directory from an Mplus run covariance matrix output:

1
.9 1
.7 .6 1

This could correspond to three variable names such as race age marital status. The names for the columns are the same as the names for the rows, in the same order.

It could be this will not work, and that you have to initially create a full, or square matrix like the top one.

Here is actual data as I received it in my data editor of the first five rows from Mplus

. import delimited c:\chest\aug302023bayescovs.txt
(1 var, 951 obs)

v1
0.10144859E+00 -0.66104774E-01 0.24455941E+00 -0.16166649E-02 -0.80306507E-02
0.13776392E-01 -0.10722875E-01 -0.54454581E-01 -0.13130958E-02 0.85201854E-01
-0.22969721E-01 -0.11575400E+00 -0.28096750E-02 -0.18678494E-01 0.16027997E+00
0.14528202E-02 0.12793232E-03 0.23382596E-03 0.67499592E-03 -0.24950242E-02
0.42814092E-01 -0.14386454E-01 -0.68730341E-02 0.94831028E-04 0.42607063E-03
(etc on through 4753 numbers)

ignore the (1 var, 951 obs) It has no meaning for us in this context. Also, ignore v1

The data is space delimited.

Now, to compute the number of covariances in this array, we have (N) 97 multiply by (N-1) 96 equals 4656
We have to add to this the number of variances in the array above which is 97.
That total number of elements in the data from Mplus is 4753 up above.

It is in LOWER triangle form (NOT a full matrix) with variances in the diagonal and covariances off diagonal in the lower triangle.

So to read this in for the purposes of making a matrix, you read the first upper left number, move through the row, then
read the next row down, left to right, and so forth, to fill a matrix looking like this

var
cov var
cov cov var
cov cov cov var
cov cov cov cov var
etc

Let me know if this was understandable, and if you need anything else from me.

Best, Pete
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30192

30 Aug 2023, 17:57

Based on what you show, I'm assuming that the data set you are starting from begins like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str80 v1
"0.10144859E+00 -0.66104774E-01 0.24455941E+00 -0.16166649E-02 -0.80306507E-02"
"0.13776392E-01 -0.10722875E-01 -0.54454581E-01 -0.13130958E-02 0.85201854E-01"
"-0.22969721E-01 -0.11575400E+00 -0.28096750E-02 -0.18678494E-01 0.16027997E+00"
"0.14528202E-02 0.12793232E-03 0.23382596E-03 0.67499592E-03 -0.24950242E-02"   
"0.42814092E-01 -0.14386454E-01 -0.68730341E-02 0.94831028E-04 0.42607063E-03"  
end

Then you can create the matrix M you need with:

Code:

split v1, gen(var) destring
drop v1

local r 1
local c 1

matrix M = I(97)

forvalues o = 1/`=_N' {
    forvalues v = 1/5 {
        matrix M[`r', `c'] = var`v'[`o']
        if `r' != `c' {
            matrix M[`c', `r'] = M[`r', `c']
        }
        local ++c
        if `c' > `r' {
            local c = 1
            local ++r
        }
    }
}

Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2407

30 Aug 2023, 18:52

Here's an alternative to what Clyde has shown. I only used the first three rows to show how to turn that in a 5x5 matrix, but this extended directly to whatever size of matrix you have. You'll end up with a symmetric Mata or Stata matrix and then you can use that however you wish.

Code:

clear *
cls

cd "c:\tmp\cov"

import delimited covs.txt, delim(" ") clear case(lower)
list

* number each set of observations and row number within each observation to preserve the structure
* of the imported data.
gen `c(obs_t)' row = _n
order row, first
sort row

* flatten the data so that all values are recorded into a single observation.
* renumber those variables to be sequentially ordered.
gen byte set = 1   // nuissance variable used only for reshape to work
reshape wide v* , i(set) j(row)
drop set
rename (v#) (v#), renumber

mata:
  // read the data into Mata
  X = st_data(., "v*", .)
 
  // stripe the vector to a symmetric matrix
  V = J(5,5,.)
  counter = 0
  for (j=1; j<=5; j++) { // loop over columns
    for (i=1; i<=5; i++) { // loop over rows
      if (i<=j) {
        counter++
        V[i, j] = X[counter]
        V[j, i] = V[i, j]
      }
    }
  }
  V
 
  // push the matrix to Stata
  st_matrix("V", V)
end

matlist V

Selected output:

Code:

:   V
[symmetric]
                  1              2              3              4              5
    +----------------------------------------------------------------------------+
  1 |     .10144859                                                              |
  2 |   -.066104774      .24455941                                               |
  3 |  -.0016166649   -.0080306507     .013776392                                |
  4 |   -.010722875    -.054454581   -.0013130958     .085201854                 |
  5 |   -.022969721       -.115754    -.002809675    -.018678494      .16027997  |
    +----------------------------------------------------------------------------+

. matlist V

             |        c1         c2         c3         c4         c5
-------------+------------------------------------------------------
          r1 |  .1014486                                             
          r2 | -.0661048   .2445594                                  
          r3 | -.0016167  -.0080307   .0137764                       
          r4 | -.0107229  -.0544546  -.0013131   .0852019            
          r5 | -.0229697   -.115754  -.0028097  -.0186785     .16028

Comment

Peter Hoon

Join Date: Dec 2016

Posts: 13
#6

31 Aug 2023, 13:40

Leonardo and Clyde, thanks for your fine work. Truly outstanding efforts from Leonardo and Clyde!

I'm going to test both versions over this Labor Day weekend. Give me through Tuesday Sept 4, 2023 to get back to you all about the results of my tests.

I have a Medtronics C++ Senior Programmer here to help me on debugging, and will post about any issues we run into over the next several days. Please standby if possible.

I need a couple of immediate answers from you before I start testing.

1) I assume I paste the code into a do file and execute from a do file (not the command line). Is that correct?

2) Clyde, I'm not familiar with datasex, and how to set up the input file to work with your code.
Leonardo's code shows how I can use the import command (which I have used) to get a hold of my data file that his code addresses.
At my limited level of ability, that works better for me at this time. Leonardo, thank you for showing us how to tell Stata to use a white space delimiter.

3) In Leonardo's code, am I correct to just substitute 97 for the numbers 5 I see, to get the code to work with all of my 97 variables, in my monstrous input file?

4) Likewise Clyde, should I substitute 97 for the 5 in your code?

5) Clyde, am I correct that you have already initialized matrix M to contain 97 columns and 97 rows by matrix M = I(97) ?
For others who are following, I'd like to make a few points. Mplus limits estimators when reading in summary data such as a covariance matrix or a correlation matrix to GLS ML ULS (generalized least squares, Max likelihood, unweighted least squares). Factormat in Stata allows iterated principal factors, not available in Mplus. Initial tests show that ipf in Stata and ML in Mplus produce virtually identical factor loadings, a good outcome. In medical research, where patients' health is involved, it is good to get congruence from two different statistical packages, such as Stata and Mplus. Mplus will only write out a covariance matrix after Bayes or Expectation Maximization. Yet Mplus requires a correlation matrix for summary data, and will not read in a covariance matrix, a limitation. _getcovcorr function in stata is required to convert the Mplus covariance matrix back into a correlation matrix, for Mplus read in to construct path models. _getcovcorr in Stata can also place your matrix into either a lower, upper triangle, or a full, symmetric Hermitian matrix. After developing a SEM or path model from a corr or covariance mtx input, it is best to confirm the model with raw data read to your SEM stat package.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2407
#7

31 Aug 2023, 14:16

Originally posted by Peter Hoon View Post

I'm going to test both versions over this Labor Day weekend. Give me through Tuesday Sept 4, 2023 to get back to you all about the results of my tests.

I have a Medtronics C++ Senior Programmer here to help me on debugging, and will post about any issues we run into over the next several days. Please standby if possible.

I need a couple of immediate answers from you before I start testing.

Please understand that this is a forum made up of Stata users who freely volunteer their time to help out and post when they can. We are not paid staff members of Stata Corp. What you wrote can be perceived as an imposition on both Clyde's and my time. While I do not believe that is how the message was intended, others may take it as such, and your message may be perceived as rude or demanding. For myself, I help when and where I can. Many of us are based in the US and Canada, and we are coming up to a long weekend. I will not be much interested to check this forum for urgent issues, so please understand that future help may be slow to arrive, if it comes at all.

Originally posted by Peter Hoon View Post

3) In Leonardo's code, am I correct to just substitute 97 for the numbers 5 I see, to get the code to work with all of my 97 variables, in my monstrous input file?

Yes, that's right. It should work fine with that substitution. In previous work, my coefficients were more limited in number and I was interested in Monte Carlo simulations so I opted for a different method than what I demonstrated above that used -infile- or -infix- which allows for a more direct input from a CSV file to the wide layout that I've showed above. The downside however is the creation of the dictionary file takes much more typing and some time to set up.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#8

31 Aug 2023, 14:23

1) I assume I paste the code into a do file and execute from a do file (not the command line). Is that correct?

Correct.

2) Clyde, I'm not familiar with datasex, and how to set up the input file to work with your code.
Leonardo's code shows how I can use the import command (which I have used) to get a hold of my data file that his code addresses.
At my limited level of ability, that works better for me at this time. Leonardo, thank you for showing us how to tell Stata to use a white space delimiter.

Do familiarize yourself with -dataex-. If you are using Stata version 14.2 or later, it is part of your official Stata. If still using an earlier one get it from SSC. Read -help dataex- to learn how to use it--it's very simple. It is the best way to show what a Stata data set looks like.
In writing my post at #4, I first created a Stata data set that matched, as I understood it, the data example you showed in the middle of #3. Then I used -dataex- to turn it into something that could be posted and used. If I did it correctly, you already have this data set: it is what you got from importing the MPlus output into Stata. And while I agree with Leonardo that changing your -import- command to get a more convenient result in Stata is a good idea, I didn't feel confident enough about the details of the MPlus output to propose a specific command. And since you already had a Stata import of the MPlus output, I figured we should work from there. In short, just start with the import file you already have and apply the rest of my code.

4) Likewise Clyde, should I substitute 97 for the 5 in your code?

No! In Leonardo's code he is "targeting" a 5x5 matrix for illustration. My code targets a 97x97 matrix. The 5 in my code reflects the fact that each line of the Stata-imported MPlus output you showed in #3 contains 5 numbers. So don't change that 5 unless the actual Stata file you imported contains some other number of numbers in its observations.[/quote]

5) Clyde, am I correct that you have already initialized matrix M to contain 97 columns and 97 rows by matrix M = I(97) ?

Yes.

Last edited by Clyde Schechter; 31 Aug 2023, 14:26.
Comment
Peter Hoon

Join Date: Dec 2016

Posts: 13
#9

31 Aug 2023, 16:48

Gentlemen, Clyde and Leonardo, I'm very sorry about my comment. I extend my appologies. I am very excited about getting this project toward completion, and did not think ahead to the consequences of my standby comment.
Thanks for your comments.
Everyone have a great week, I'll get to work, and we'll talk again in the future at your convenience.
Comment
Peter Hoon

Join Date: Dec 2016

Posts: 13
#10

07 Sep 2023, 21:06

Clyde and Leonardo: Here is the run output showing a names problem I haven't solved. Running
Clyde's code. Now using 90 variables.
Please realize that stata displays a lower triangle, but in fact, stores the matrix as a full symmetric mtx.

Suggestions appreciated:

1. (/v# option or -set maxvar-) 5000 maximum variables

. do "C:\chest\read Mplus var - cov data to form Stata mtx.do"

. /*
> Clyde Schechter's code for reading in a var - cov mtx written out
> by Mplus as a .txt file. Mplus's .txt file is all one vector, rowise, from left
> white space delimited, representing
> a lower triangle variance - covariance matrix. The code below
> gets the .txt file into a matrix that can be read into
> Stata factor analysis program factormat. 9/7/2023.
> The Matrix M looks like this:
> var
> cov var
> cov cov var
> cov cov cov var etc. Though Stata displays a lower triangle, the mtx M
is in fact stored as a full symmetric matrix.
> */
. clear

. cd c:\chest
c:\chest

. set more off

. import delimited c:\Mplus\sep72023standbayescovs.txt
(1 var, 819 obs)

. /*Clyde's code follows*/
. split v1, gen(var) destring
variables born as string:
var1 var2 var3 var4 var5
var1 has all characters numeric; replaced as double
var2 has all characters numeric; replaced as double
var3 has all characters numeric; replaced as double
var4 has all characters numeric; replaced as double
var5 has all characters numeric; replaced as double

. drop v1

.
. local r 1

. local c 1

.
. matrix M = I(90) /*Set this value to the num of variables you have*/

.
. forvalues o = 1/`=_N' {
2. forvalues v = 1/5 {
3. matrix M[`r', `c'] = var`v'[`o']
4. if `r' != `c' {
5. matrix M[`c', `r'] = M[`r', `c']
6. }
7. local ++c
8. if `c' > `r' {
9. local c = 1
10. local ++r
11. }
12. }
13. }

.
. matrix list M/*list your matrix to confirm accuracy of import and read in*/

symmetric M[90,90]
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
r1 1.0009118
r2 -.42002171 1.0014574
r3 -.04280465 -.13911006 1.0005674
r4 -.11558969 -.37713748 -.03835486 1.0006192
r5 -.18028732 -.58547007 -.05956859 -.16077259 1.0018269
r6 .02240978 .00025084 .00948534 .01138361 -.02927459 1.0006201
r7 -.04524086 -.01322443 .00099151 .0012906 .05115434 -.51524085 1.0007675
r8 .13881011 .22103494 -.07552929 .09155974 -.42923755 .09309907 -.17002208 1.0005052

and so forth for 4005 (90*89/2) elements

Here is the Stata error message which to me says that Stata needs a way to get 90 names in place for each column
and corresponding row, which I have not succeeded in setting up properly. Factormat below has been tested on on a previous
matrix with names. I tried to give the matrix names in factormat, v1-v90, but it did not work:

timer clear

. timer on 1

.
. factormat M, n(7786) shape(full) names(v1-v90) citerate(10) ipf factors(8)

name conflict: row and column names of M should match
r(198);

end of do-file

r(198);
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#11

07 Sep 2023, 22:14

Code:

// BUILD A LIST OF NAMES TO APPLY TO BOTH ROWS AND COLUMNS local names forvalues i = 1/90 { // OR WHATEVER THE DIMENSION OF THE MATRIX IS local names `names' var`i' } // RENAME THE ROWS AND COLULMNS matrix rownames M = `names' matrix colnames M = `names'

Now M will have the same names in the rows as columns.

Another way of doing it that might be more useful is, instead of building a list of names that reads var1 var2 ... var90, I presume given their provenance that these are actual variables that you used in your MPlus analysis. Nothing in this thread hints at what those variables' names are. But if you can scrape up a list of the real variable names, use that instead--it'll probably make the factormat output more useful to you that way.
Comment
Peter Hoon

Join Date: Dec 2016

Posts: 13
#12

08 Sep 2023, 18:21

Clyde and Leonardo, thank you for your help. Clyde, with the recent code that builds names into matrix M, found
that I could use factormat (and other Stata routines), to substitute in my own names. Here is a successful
factormat run (with documentation to help others) based on Clyde's code. Leonardo,
don't have time to test your code, but I'm sure it would also lead to success.
Unfortunately, Statalist doesn't give me enough space to post this lengthy run. Factormat ran fast and successfully.
Best from Paul aka "Peter".
Comment
Peter Hoon

Join Date: Dec 2016

Posts: 13
#13

13 Sep 2023, 15:53

Clyde, Leonardo, and multivariate researchers in the biology, sociology, econometrics, medicine, psychology, I am
posting code (a do file) I'm using daily to read in a covariance matrix from Mplus (in this case, Bayes estimation). I hope this
code will assist others in their work. If you have suggestions for improving documentation, below,
please let me know. Best to all from Pete.

/*
Clyde Schechter's code for reading in a var - cov mtx written out
by Mplus as a .txt file. Mplus's .txt file is all one
single dimensioned vector,
white space delimited, representing a
a lower triangle variance - covariance matrix. Most likely,
SAS, SPSS R, AMOS also write out variance covariance matrices.
The code below
gets the .txt file that Mplus outputs into a matrix that can be read into
Stata factor analysis program factormat, 9/13/2023.
Stata's factormat program starts with "summary data",
a variance - covariance matrix, which several authors
(Kline, Bollen and Bentler) recommend
(in lieu of a correlation matrix) for SEM and factor analysis work.
The Matrix M looks like this as it is read into Stata:
var
cov var
cov cov var
cov cov cov var etc.

Stata factormat with quartimax rotation runs in just a few seconds,
a great advantage in gaining a rpid understanding of latent constructs.
Stata _getcovcorr will alter the covariance matrix back into a correlation matrix,
that Mplus (and others) can read and use to build ESEM or CSEM models. Eventhough,
a covariance matrix is recommended, Mplus strangely often requires
a correlation matrix for ESEM, etc.

An important point to remember, that is often confusing
in Stata:
The lower triangle matrix is read into Stata correctly.
Yet the matrix is stored as a full Hermitian symmetric matrix by
Stata. The Stata command
matrix list M
DISPLAYS a lower triangle
matrix! You are led to believe it is stored as such, but it is not.
Again, it is stored as full symmetric Hermitian.

A covariance, SEM or FA model on a hex core machine
running at 3.6 GHz,
turbo boosted,
can take up to 60 hours in Mplus, SAS, SPSS or Stata from raw data,
when observations number seven hundred thousand. When summary
data is used as input, FA, SEM, CFA, CSEM in Stata
may take only A FEW SECONDS from a matrix of
dimensioned at 90! The importance/advantage of having the capability
of reading in a var - cov mtx from SAS, Mplus, Stata, SPSS, AMOS shown
in the output below, becomes
clear for multivariate researchers who have big data:
*/

cd c:\chest
set more off
clear
// import delimited c:\Mplus\sep92023standiwbayescovs.txt
import delimited c:\Mplus\sep82023standbayescovs.txt
// import delimited c:\Mplus\sep102023unstandiwbayescovs.txt
/*import delimited c:\Mplus\sep112023unstandiwfixmeansbayescovs.txt*/
// import delimited c:\Mplus\sep132023standiw60bayescovs.txt
// Clyde's code follows
split v1, gen(var) destring
drop v1

local r 1
local c 1

matrix M = I(90) /*Set this value to the num of variables you have*/

forvalues o = 1/`=_N' {
forvalues v = 1/5 {
matrix M[`r', `c'] = var`v'[`o']
if `r' != `c' {
matrix M[`c', `r'] = M[`r', `c']
}
local ++c
if `c' > `r' {
local c = 1
local ++r
}
}
}

// BUILD A LIST OF NAMES TO APPLY TO BOTH ROWS AND COLUMNS by Clyde Schechter
local names
forvalues i = 1/90 { // OR WHATEVER THE DIMENSION OF THE MATRIX IS
local names `names' var`i'
}

// RENAME THE ROWS AND COLULMNS Clyde Schechter's code
matrix rownames M = `names'
matrix colnames M = `names'

matrix list M/*list your mtrix to confirm accuracy of import and read in.
As Clyde points out, the number of variables in your data set
is equal to the dimension of the matrix*/

matrix symeigen X v = M
matrix list v/*list eigenvalues to spot a non positive definite
variance - covariance mtx. In this case last eigenvalue is neg*/

timer clear
timer on 1

/*
Test run of factor analysis using factormat of matrix M with names.
Note: first letter of matrix must be capitalized

WARNING: the MATRIX IN THIS TEST IS NOT POSITIVE DEFINITE, and so researcher
Pete must get back to work in
Mplus to create a positive definite matrix! Stata allows, however,
the forcepad option, to complete a test FA run. Virtually no other stat
package such as SPSS, MPLUS, SAS, (and probably R) has such a capability.
It is unknown whether the 15 factor test solution below will be similar to,
or quite different from, a matrix that has been corrected to positive definite.
Usually, when the last egenvalue is slightly negative, you will get identical
solutions.*/

factormat M, n(7786) shape(full) citerate(15) ipf factors(15) forcepsd ///
names(SINGLE MARRIED SEPARATE ///
DIVORCED WIDOWED HISORG IRIBE ///
YR_BIRTH SEQ_NUM MDXRCMP ORGRISK FORORG ///
CENTRAL UPPERIN LOWERIN UPPEROUT LOWEROUT AXILTAIL ///
OVERLAP LATERAL GRADE EOD10_SZ ///
EOD10_EX EOD10_NE EOD10SRG LYMPHMIS CSTUMSIZ CSEXTEN ///
LYMPOD10 CS5SITE CS6SITE DAJCCT DAJCCN DAJCCSTG ///
SURGPRIF SURGSITF SRGDTMIS NUMNODES NO_SURG RADIATN ///
RADMIS RAD_SURG SS_SURG MASTMIS SURGSITE ADISMIS ///
ICD9V10V UNSPPSM EPITHPSM SQUAMPSM ADENCAR ///
CYTMUSER DUCTLOB HST_STGA A3SEERSG FIRSTPRM CTYMEDN ///
CTYPOV CTYPOV18 POVBIRTH CTYINCID HSEDSTAT COLEDUST ///
DXCTYCOL DXCTYHS DXSTRISK CODPUB STAT_REC SUMM2K ///
DETHCLSS CSTSEVAL CSRGEVAL CSMETVAL INTPRIM ERSTATUS ///
PRSTATUS SRVTIMON INSRECPB ADJTM6VL CSMETDX CS7SITE ///
HER2 BRST_SUB METBONPB METBRPB METLVPUB METLGPUB ///
T_VALUE ED10NDPN M_VALUE)

// FACTORMAT OPTIONS TO EXPERIMENT WITH
// about 5 iters keeps uniquenesses positive
// factormat M, n(7786) shape(full) ipf factors(25}*/

/*
citerate(25) ipf only
ipf iterated principal factors
pf principal factors
ml maximum liklihood
pcf principal component
*/

/*forcepsd*/

timer off 1
timer list 1

/*ipf citerate(2) factors(18)*/
/*Or factor (varlist), pf factors(10)*/

timer clear
timer on 1

/*test of quartimax rotation for 10 factors*/
rotate, quartimax norm factors(15) blanks(.23)

/*rotate, oblimin factors(10)*/

timer off 1
timer list 1
Comment
Peter Hoon

Join Date: Dec 2016

Posts: 13
#14

25 Nov 2024, 12:58

Clyde and Leonardo:

As of Nov 25 2024, I am now working with a new subset of variables that numbers 87 (instead of the 90 variables in my last post above).

The code Clyde developed only works if the rank of the matrix is evenly divisible by 5. That is because Stata displays (and may also store in memory) each row containing five columns of data. In my new case of 87 variables, the last row of data to be read in only contains three columns with space delimited numbers, and Clyde's code is expecting that all five columns have numbers. The code fails.

Would it be possible to alter the code below to read in data when the last row of data has less than five numbers?

I have installed dataex, and have tried to get the following code into dataex for Clyde and other Stata list members.
My apologies; I have not succeeded. Will keep trying.

Here is the code that hopefully could be modified to work with 87 variables. Call on me to supply more information if needed to assist you.

// change directory if needed
cd c:\chest

set more off
clear

// Use of Stata's import command
import delimited c:\Mplus\sep82023standbayescovs.txt

split v1, gen(var) destring
drop v1

local r 1
local c 1

// Set this value to the number of variables you have.
// Clyde points out that this number is also = rank of mtx.
matrix M = I(90)

// Clyde Schectner's code:
forvalues o = 1/`=_N' {
forvalues v = 1/5 {
matrix M[`r', `c'] = var`v'[`o']
if `r' != `c' {
matrix M[`c', `r'] = M[`r', `c']
}
local ++c
if `c' > `r' {
local c = 1
local ++r
}
}
}

// BUILD A LIST OF NAMES TO APPLY TO BOTH ROWS AND COLUMNS by Clyde Schechter
local names
forvalues i = 1/90 { // OR WHATEVER THE DIMENSION OF THE MATRIX IS
local names `names' var`i'
}

// RENAME THE ROWS AND COLULMNS Clyde Schechter's code
matrix rownames M = `names'
matrix colnames M = `names'

// list your mtrix to confirm accuracy of import and read in.
// As Clyde points out, the number of variables in your data set
// is equal to the dimension of the matrix*/
matrix list M

// list eigenvalues to spot a non positive definite
// variance - covariance mtx. One or more could be negative
// If you have neg eigenvalues, you may need to combine
// collinear variables by principal components analysis.
matrix symeigen X v = M
matrix list v
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30192

#15

25 Nov 2024, 17:13

Without sample input, I'm not sure what I'm asked to deal with here. Is it still the case that the MPlus input comes in 5 columns but is read as a triangular matrix like
var
cov var
cov cov var
...
etc.

So I've written some code that creates a toy data set (with random numbers for the covariances) that will look like the input of the MPlus output after you have -import-ed it to Stata and renamed the variables. You already have the real MPlus output, so you will need only the code from // CREATE THE MATRIX on down. To develop and test, I used 17 "variables" rather than 87, but it should actually work for any number of variables, at least up to the point where memory is exhausted.

Code:

//    CREATE A TOY DATA SET THAT LOOKS LIKE AN MPLUS COVARIANCE MATRIX
//    WITH 87 VARIABLES, READ SNAKEWISE, AND LAID OUT IN 5 COLUMNS
clear*
set seed 1234

set obs 1
local ncols 5
local nvars 17

local varnum 1
local obs_no 1
local col_no 1

forvalues i = 1/`ncols' {
    gen var`i' = .
}

local max_count = (`nvars'+1)*(`nvars')/2
local counter 1

while `counter' <= `max_count' {
    replace var`col_no' = runiform() in `obs_no'
    if `col_no' < `ncols' {
        local ++col_no
    }
    else {
        local col_no 1
        insobs 1, after(_N)
        local ++ obs_no
        
    }
    local ++counter
    
}



//    CREATE THE MATRIX
local nvars 17
local r 1
local c 1


matrix M = I(`nvars')

forvalues o = 1/`=_N' {
    forvalues v = 1/`ncols' {
        matrix M[`r', `c'] = var`v'[`o']
        if `r' != `c' {
            matrix M[`c', `r'] = M[`r', `c']
        }
        local ++c
        if `c' > `r' {
            local c = 1
            if `r' < `nvars' {
                local ++r
            }
            else {
                continue, break
            }
        }
    }
}

Announcement