Correlation Matrix Restricting Observations

Macon Overcast

Join Date: Mar 2022

Posts: 7
#1

Correlation Matrix Restricting Observations

15 Mar 2022, 12:24

Hi, I am using the following code to determine correlation between data with non-normal distribution. My code is as follows, but with generic variable names: spearman x1 x2 x3 x4, star(0.05)

I am encountering an issue of restricted observations for this test because x4 has less observations than x1, x2, x3. To my understanding, because spearman is a rank correlation, this restriction is mathematically necessary. However, I would like to include my full set of observations because my dataset is limited in size and x4 is of interest. Is there a way to get around this statistically using rank correlation?
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4412
#2

15 Mar 2022, 12:42

your situation is not completely clear to me, but looking at the help file, I see the "pw" option which I think is what you want; see

Code:

help spearman
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#3

15 Mar 2022, 13:23

I agree with Rich Goldstein. You seem to be asking for pairwise calculations,n

The restriction you mention would bite with any kind of correlation. If one variable has missing values where the other doesn't, those observations cannot be included in the calculation. This applies with measurements and ranks alike.
1 like
Comment
Macon Overcast

Join Date: Mar 2022

Posts: 7
#4

15 Mar 2022, 21:34

Thank you Rich and Nick. Nick - your comment addresses my concern. I will just live with limited observations. My data is not paired - I apologize for the confusion there. Great to know about the pairwise function at any rate!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#5

16 Mar 2022, 02:30

I don't follow your comment on pairing. Any kind of correlation requires two variables to be measured for each observation, e.g. person or place or time.

FWIW, I don't think having non-normal marginal distributions is much of a barrier to using Pearson correlation. When the latter is useful, it is a measure of linearity. If you want P-values or confidence intervals use simulation or bootstrapping. If those don't apply (e.g. independence is violated) then that is a bigger deal than not being normally distributed and inference is probably off the table any way. If correlation is bumped up (or down) by outliers and/or skewness, then that is what it is and/or means you would be better off on a transformed scale.
1 like
Comment

Announcement

Correlation Matrix Restricting Observations

Comment

Comment

Comment

Comment