Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to best calculate correlations for observations that are not independent of each other?

    Hey everyone,

    quick disclaimer upfront: I also cross-posted this in Stackoverflow here.

    I have the following case. My data set consists of 78 policy documents (= my observations). These were written by 50 different country governments (in the period between 2005 and 2020). While 27 countries have written only one policy document, 23 countries have written multiple policy documents. In the latter case, these same-country different-policy documents have usually been written years apart by different governments/administrations and different ministries. Nevertheless, I reckon there is probably a risk that these same-country observations are not independent of each other. My overarching question is, therefore: How would you calculate correlations in this case? More specifically:
    1. Pearson assumes the independence of the observations, thus, is not suitable here, correct? Or could one even credibly argue that the observations are independent after all, since they were usually published many years (and therefore governments) apart and by different ministries?
    2. Would "within-participants correlation" (Bland & Altman 1995 a & b) or "repeated measures correlation" (= RMCORR in R and Stata) be more suitable? Or is something else more appropriate?
    3. Furthermore: Would I otherwise have to take into account any time effects when running correlations in my setting and, if so, how?
    Thank you very much for your advice!
    Last edited by Nicolai Schulz; 16 Mar 2022, 02:27.

  • #2
    Thanks for flagging the cross-posting on Stack Overflow (SO). I post there too from time to time and my view is that it's off-topic there as a primarily statistical question. Your post may not be closed: long story short, there aren't many people following the Stata tag who also have enough "reputation" to be able to vote to close. Explainer: "reputation" in SO is a points total depending on votes received and other contributions.

    I share your concerns about dependence here, but the question seems -- in part -- to answer itself. Why are correlations of interest at all if they lack an interpretation? I regard almost all correlations as just descriptive statistics any way. People routinely calculate correlations for different countries or different times when dependence of some kind is pervasive: it's a routine conspiracy to point out the problem in better texts and then ignore it.
    Last edited by Nick Cox; 16 Mar 2022, 02:47.

    Comment


    • #3
      Many thanks for your quick reply and clarification, Nick!

      Two quick follow-ups to your response wrt dependence, if I may (apologies if they should be obvious and I am a bit slow on the uptake):
      1. You write "Why are correlations of interest at all if they lack an interpretation?" Could you perhaps shortly elaborate what you mean by they "lack an interpretation"? Do you mean that they lack a strong underlying statistical theory for useful interpretation? Or do you refer to authors not interpreting the correlations they find? Or something else?
      2. You write "it's a routine conspiracy to point out the problem in better texts and then ignore it." Do you refer to the practice that (journal) peer-reviewers point out potential dependence issues, and that authors can choose to ignore this without much repercussion (that is, reviewers will not reject their publication for the existence of that issue)?
      And perhaps just a quick additional note: exactly, we would treat the correlations in our analysis as preliminary explorative and visually efficient descriptive statistics of potential associations between several variables. Nothing more. Definitely no claim for any causality or strong validity of the association. Nevertheless, I would want to do this descriptive statistic as well as possible. And I was rather shocked to see that in my first trials the results of corr vs rmcorr can be extremely different, hence, choosing the "correct" option seemed of relevance to me.

      Once again, many thanks!

      Comment


      • #4
        If there is a conspiracy it's mostly among textbook writers and teachers (me too, on the latter front). The problem is that when correlations are introduced it is proper to mention some assumptions (or ideal conditions), but not easy or a good idea pedagogically to talk about what to do when assumptions are violated (although use a model appropriate to the generating process is vacuous wording). So, my emphasis is always that a correlation is descriptive, and look at the scatter plot too.

        You're saying that correlations are hard to interpret because on dependence issues. I agree, but why bother with them is easy to ask and harder to answer.

        I am being a little (or even a lot) flippant here, but I don't have an easy solution for you. But I can't be surprised that calculations based on quite different views of what is being correlated give different answers.

        Comment


        • #5
          Many thanks, Nick!

          Comment

          Working...
          X