Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation

    Hi, I want make correlation for my data to find a mismatch between education and occupation. I have variable 1-3 which, 1 for undereducation, 2 for match, 3 for overeducation. How should I perform these correlations with gender, income, age... with my definition of mismatch. Please help me! Thank you!
    Last edited by Damian Chroboczek; 18 Apr 2024, 12:36.

  • #2
    It is possible to do correlations (both Pearson and Spearman -- see the help files for -corr-, -pwcorr-, and -spearman-) between a 3-level ordinal variable and continuous variables. And there are single statistic summaries of the association between two categorical variables, such as Goodman-Kruskal gamma, Kendall's tau, and Cramer's V, which can be obtained from the output of -tabulate- when the corresponding options are specified--see -help tabulate twoway-. There is also a special correlation coefficient for use with ordinal variables, both continuous and discrete (so, age and income qualify, but not gender), the polychoric correlation coefficient, and it can be calculated in Stata using Stas Kolenikov's -polychoric- package. (Run -findit polychoric- and click through the links to install it if you want to go this route.)

    But these are typically not the most informative way of presenting the relationships. In my opinion, you will be better serving your audience if you simply present a table with the mean and standard deviation of age in each of the three education mismatch levels. You might do the same for income, although depending on how that is distributed in your sample, a median and interquartile range might be more appropriate than a mean and standard deviation. Similarly, as for the relationship with gender, a simple cross tab will show a much clearer picture than any single-number summary statistic.

    Comment


    • #3
      Thank you very much for your answer and tips, but I didn't understand much. I am a student of Economics Faculty, and I'm writing a thesis about mismatch in the Labour Market. I just want to know, how simply do a correlation with P-value between income with my definition of mismatch, gender, age...etc..

      Comment


      • #4
        Well, I don't think I would do correlations for these variables in the first place. I know different disciplines have different traditions and ways of doing things, but I don't think economists would do that either. I'd be much more inclined to do things like:
        Code:
        tabstat income, by(mismatch) statistics(p25 median p75)
        tab gender mismatch, col
        tabstat age, by(mismatch) statistics(mean sd)
        to show the relationships between mismatch and these other variables.

        If I were forced to calculate some statistic with "correlation" in its name, for simplicity I would probably do:
        Code:
        spearman mismatch income age, stats(rho obs p)
        It is not mathematically possible to do any kind of correlation with gender, because correlations, by definition, require ordinal level variables, and gender is only categorical.

        Comment

        Working...
        X