Background:
I am working on a longitudinal study where we use a measure (MOS-HIV; 35 items that are ordinal categorical variables ranging from 1-3, 1-5, and 1-6 ) where to get the final scores (mental health and physical health summary scores), we can use the scoring coefficients from the patient population of the original validation study. Given our sample is quite different (original patient population North Americans with HIV; our sample is East Africans with HIV), we would like to use our own scoring coefficients from our sample and the measures of time (5 visits total)
To get to this point, for the baseline data we ran a two-factor confirmatory factor analysis with a varimax roation and developed the two summary scores with no problem and compared them to the primary method of scoring (using the Roche patient population as described in first paragraph) just as an extra check to make sure the two summary scores for each scoring method were highly correlated. We detected no issues. For subsequent data, I used matrix2dta to store the scoring coefficients from baseline, transformed it, and had no issue merging it in the database with all subsequent data (no baseline).
Actions thus far:
I wanted to test how to make the summary scores (which I made using the -predict- for the baseline) by replicating the baseline data by manually calculating the predicted scores using the values and the scoring coefficients. I wanted to do this to ensure I was calculating the scores correctly at the subsequent visits. I did this only with Factor 1 as if I can match the "manual" scores with the predicted scores with one of the two outcomes at one visit, I presumed I replicate the calculation. I standardized the 35 variables per info in the helpfile for factor postestimation (page 351) where it says : "The table with scoring coefficients informs us that the factor is obtained as a weighted sum of standardized versions of headroom, rear seat, and trunk with weights 0.28, 0.27, and 0.46."
For my first calculation, I multipled the rotated scoring coefficients by the standardized values for each of the 35 scoring coefficients and corresponding standardized variables from baseline data only, and then summed the products of this calculation. They were not equal to the predicted scores nor were they correlated (I tested correlation to see if perhaps same score but on different scale somehow).
For my second calculation, I multipled the unrotated scoring coefficients by the standardized values for each of the 35 scoring coefficients and corresponding standardized variables from baseline data only, and then summed the products of this calculation. They were also not equal to the predicted scores nor were they correlated.
For further attempts, I repeated the above steps after specifying a Bartlett regression scoring method for the predict comment (instead of the default regression method). These also didn't work (work meaning match the stata producted via predict command summary scores).
The bottom line question is: What is the "under the hood" calculations for the predict command so that I can replicate using scoreing coefficients from a baseline visit for subsequent visits it in my longitudinal study?
This is my first post and I read instructions and tried to be as detailed as possible. Please let me know if I can provide more information.
I am working on a longitudinal study where we use a measure (MOS-HIV; 35 items that are ordinal categorical variables ranging from 1-3, 1-5, and 1-6 ) where to get the final scores (mental health and physical health summary scores), we can use the scoring coefficients from the patient population of the original validation study. Given our sample is quite different (original patient population North Americans with HIV; our sample is East Africans with HIV), we would like to use our own scoring coefficients from our sample and the measures of time (5 visits total)
To get to this point, for the baseline data we ran a two-factor confirmatory factor analysis with a varimax roation and developed the two summary scores with no problem and compared them to the primary method of scoring (using the Roche patient population as described in first paragraph) just as an extra check to make sure the two summary scores for each scoring method were highly correlated. We detected no issues. For subsequent data, I used matrix2dta to store the scoring coefficients from baseline, transformed it, and had no issue merging it in the database with all subsequent data (no baseline).
Actions thus far:
I wanted to test how to make the summary scores (which I made using the -predict- for the baseline) by replicating the baseline data by manually calculating the predicted scores using the values and the scoring coefficients. I wanted to do this to ensure I was calculating the scores correctly at the subsequent visits. I did this only with Factor 1 as if I can match the "manual" scores with the predicted scores with one of the two outcomes at one visit, I presumed I replicate the calculation. I standardized the 35 variables per info in the helpfile for factor postestimation (page 351) where it says : "The table with scoring coefficients informs us that the factor is obtained as a weighted sum of standardized versions of headroom, rear seat, and trunk with weights 0.28, 0.27, and 0.46."
For my first calculation, I multipled the rotated scoring coefficients by the standardized values for each of the 35 scoring coefficients and corresponding standardized variables from baseline data only, and then summed the products of this calculation. They were not equal to the predicted scores nor were they correlated (I tested correlation to see if perhaps same score but on different scale somehow).
For my second calculation, I multipled the unrotated scoring coefficients by the standardized values for each of the 35 scoring coefficients and corresponding standardized variables from baseline data only, and then summed the products of this calculation. They were also not equal to the predicted scores nor were they correlated.
For further attempts, I repeated the above steps after specifying a Bartlett regression scoring method for the predict comment (instead of the default regression method). These also didn't work (work meaning match the stata producted via predict command summary scores).
The bottom line question is: What is the "under the hood" calculations for the predict command so that I can replicate using scoreing coefficients from a baseline visit for subsequent visits it in my longitudinal study?
This is my first post and I read instructions and tried to be as detailed as possible. Please let me know if I can provide more information.