Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Partial and semipartial correlation with categorical variables?

    Dear Statalist
    When I want to look at some partial or semipartial results I use the pcorr command:
    . sysuse auto

    . pcorr price mpg weight foreign
    (obs=74)

    Partial and semipartial correlations of price with
    Partial Semipartial Partial Semipartial Significance
    Variable | Corr. Corr. Corr.^2 Corr.^2 Value
    ------------+-----------------------------------------------------------------
    mpg | 0.0352 0.0249 0.0012 0.0006 0.7693
    weight | 0.5488 0.4644 0.3012 0.2157 0.0000
    foreign | 0.5402 0.4541 0.2918 0.2062 0.0000

    It is however not possible to do when having categorical variables.
    e.g if I wanted to incorporate age as an category i.age
    Is there a way to get partial and semipartial correlation when having categorical variables. (or to adjust for the effect of them).

    The regress command give the overall R-squared value for the entire model but not for the individual variables.
    Is there a way to get this with the pcorr or doing further analysis after the regress command?
    Using Stata 13.1 on a windows 7-64bit pc
    Kind Regards
    Dennis Nielsen

  • #2
    Age as a category seems like a poor example: ordinarily we have age measured as a (quasi-)continuous variable, and although it is common to see it converted into categories, that is typically bad statistical practice. To go back to the example using the auto.dta, rep78 is a categorical variable. The command -pcorr-, as you have observed, does not support factor variable notation But all you have to do is expand those yourself. The simplest approach is probably to use -tab- with the -generate- option.

    Code:
     . tab rep78, gen(rep78_)
           Repair |
    Record 1978 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |          2        2.90        2.90
              2 |          8       11.59       14.49
              3 |         30       43.48       57.97
              4 |         18       26.09       84.06
              5 |         11       15.94      100.00
    ------------+-----------------------------------
          Total |         69      100.00
      . pcorr price mpg weight rep78_*
    (obs=69)
      Partial and semipartial correlations of price with
                     Partial   Semipartial      Partial   Semipartial   Significance
       Variable |    Corr.         Corr.      Corr.^2       Corr.^2          Value
    ------------+-----------------------------------------------------------------
            mpg |  -0.0912       -0.0728       0.0083        0.0053         0.4733
         weight |   0.3852        0.3317       0.1484        0.1100         0.0017
        rep78_1 | (dropped)
        rep78_2 |   0.0498        0.0396       0.0025        0.0016         0.6960
        rep78_3 |   0.0962        0.0768       0.0093        0.0059         0.4494
        rep78_4 |   0.1410        0.1132       0.0199        0.0128         0.2663
        rep78_5 |   0.2202        0.1794       0.0485        0.0322         0.0804

    Comment


    • #3
      HI Clyde..

      Thanks so much for you answer.. I have been quite fond for the tab and gen option..

      Kind Regards
      Dennis

      Comment


      • #4
        I have the exact same question as Dennie Neilsen. I'm trying to find a way to incorporate categorical variables into pcorr. However, when I tried the tab and gen approach, and ran the pcorr command, Stata doesn't always use my baseline group, and elected its own baseline group (var_4 rather than var_1 for exmple). Why does this happen and is there a way round it please?

        Thank you very much.

        Comment


        • #5
          You can't control how Stata makes this choice. But you can deprive it of the need to make a choice by simply omitting one of the variables from your call to -pcorr-. To force var_1 to be the reference category:

          Code:
          pcorr [other variables] var_2-var_[number of last category here]

          Comment


          • #6
            Dear Dr Schechter,

            Thanks so much for your reply.
            When you said var_2-var_4, you don't mean var_2 MINUS var_4?
            Would it be correct to type in pcorr [other variables] var_2 var_3 var_4 ?

            Thanks again!

            Comment


            • #7
              see help varlist for guidance on referring to sets of variables; Clyde does NOT mean "minus"; he means var_2 through var_4

              Comment


              • #8
                Good afternoon,
                Instead of starting a new topic, I though I wake this one up; you may consider it as a sign of a profound search for an answer before asking again the same question

                (I think that) I understand the meaning of partial (semipartial) correlation coefficients (like mpg or weight in the example above) but I would like to know how to interpret the coefficients of the categorical variables, like those of rep78_2 or rep_78_3, for example. What does the value 0.0498 tell me? What is the meaning of its square?

                A line of explanation would be great; thanks in advance!

                Regards,
                Piotr Lewczuk

                PS It's great we can use factor variables now: . pcorr price mpg weight i.rep78

                Comment


                • #9
                  I've never been a great fan of partial and semi-partial correlations. Basically they are estimates of what the correlation between y and x would be if the effects of the other variables on y and x (or, for semi-partial correlation, just on x) were somehow neutralized. (For example, what might be observed in a controlled experiment where other variables were held fixed.)

                  Personally, I don't find the squares of these coefficients very useful. I find correlation coefficients, unsquared, to be a more natural metric of association. In the context of linear regression, the square takes on additional importance because it tells you the proportion of variance explained by the model--but outside of that context I don't usually look at the squares of correlations. I suppose you could say that the squared partial correlation estimates the proportion of y variance that would be accounted for by x if all the other effects were held constant. But I don't find that information useful. YMMV.

                  Comment


                  • #10
                    Thank you, but this does not answer my question; I understand the meaning of a correlation coefficient between two continuous variables; I would somehow understand a coefficient of a correlation between a categorical variable with 5 categories (i.rep78) and a continuous variable (price), but I do not understand meaning of a coefficient of a correlation between one category of a categorical variable (2.rep_78) and a continuous variable. What kind of association (of what with what) does the correlation coefficient between rep78_2 and price describe?
                    Thanks in advance for commenting.
                    Regards,
                    P. Lewczuk

                    Comment


                    • #11
                      Oh, I see you meant something different from what I thought.

                      So 2.rep78 is not a category of a variable. It is a 0/1 dichotomous variable that distinguishes category 2 of rep78 from all other categories of rep78. So this correlation coefficient means the same thing as any correlation coefficient involving a dichotomous variable. If the other variable were normally distributed, you could think of it as a rather obscure, indirect measure of the standardized mean difference in the outcome between rep78 = 2 and rep78 != 2 subsets of the data. In particular, if the correlation coefficient were 1 (or -1) it would mean that rep78 (considered as a dichotomy, 2 vs all other categories) completely separates the values of the other variable into non-overlapping distributions. If, on the other extreme, the coefficient were zero, it implies that the distribution of the other variable is the same whether rep78 = 2 or rep78 != 2.

                      BY the way, I don't think that you should somehow understand a coefficient of a correlation between a categorical variable with 5 categories and a continuous variable. Unless that categorical variable is actually ordinal (which rep78 is), such a correlation would be meaningless.

                      Comment


                      • #12
                        Thank you very much; your answer clarifies the issue and is extremely helpful!
                        Best regards,
                        P. Lewczuk

                        Comment

                        Working...
                        X