Partial and semipartial correlation with categorical variables?

Dennis Nielsen

Join Date: Jun 2014

Posts: 6
#1

Partial and semipartial correlation with categorical variables?

03 Dec 2014, 03:17

Dear Statalist
When I want to look at some partial or semipartial results I use the pcorr command:
. sysuse auto

. pcorr price mpg weight foreign
(obs=74)

Partial and semipartial correlations of price with
Partial Semipartial Partial Semipartial Significance
Variable | Corr. Corr. Corr.^2 Corr.^2 Value
------------+-----------------------------------------------------------------
mpg | 0.0352 0.0249 0.0012 0.0006 0.7693
weight | 0.5488 0.4644 0.3012 0.2157 0.0000
foreign | 0.5402 0.4541 0.2918 0.2062 0.0000

It is however not possible to do when having categorical variables.
e.g if I wanted to incorporate age as an category i.age
Is there a way to get partial and semipartial correlation when having categorical variables. (or to adjust for the effect of them).

The regress command give the overall R-squared value for the entire model but not for the individual variables.
Is there a way to get this with the pcorr or doing further analysis after the regress command?
Using Stata 13.1 on a windows 7-64bit pc
Kind Regards
Dennis Nielsen
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 29955

03 Dec 2014, 08:36

Age as a category seems like a poor example: ordinarily we have age measured as a (quasi-)continuous variable, and although it is common to see it converted into categories, that is typically bad statistical practice. To go back to the example using the auto.dta, rep78 is a categorical variable. The command -pcorr-, as you have observed, does not support factor variable notation But all you have to do is expand those yourself. The simplest approach is probably to use -tab- with the -generate- option.

Code:

 . tab rep78, gen(rep78_)
       Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00
  . pcorr price mpg weight rep78_*
(obs=69)
  Partial and semipartial correlations of price with
                 Partial   Semipartial      Partial   Semipartial   Significance
   Variable |    Corr.         Corr.      Corr.^2       Corr.^2          Value
------------+-----------------------------------------------------------------
        mpg |  -0.0912       -0.0728       0.0083        0.0053         0.4733
     weight |   0.3852        0.3317       0.1484        0.1100         0.0017
    rep78_1 | (dropped)
    rep78_2 |   0.0498        0.0396       0.0025        0.0016         0.6960
    rep78_3 |   0.0962        0.0768       0.0093        0.0059         0.4494
    rep78_4 |   0.1410        0.1132       0.0199        0.0128         0.2663
    rep78_5 |   0.2202        0.1794       0.0485        0.0322         0.0804

Comment

Dennis Nielsen

Join Date: Jun 2014

Posts: 6
#3

11 Dec 2014, 07:57

HI Clyde..

Thanks so much for you answer.. I have been quite fond for the tab and gen option..

Kind Regards
Dennis
Comment
Michelle S

Join Date: Oct 2014

Posts: 8
#4

01 May 2015, 15:55

I have the exact same question as Dennie Neilsen. I'm trying to find a way to incorporate categorical variables into pcorr. However, when I tried the tab and gen approach, and ran the pcorr command, Stata doesn't always use my baseline group, and elected its own baseline group (var_4 rather than var_1 for exmple). Why does this happen and is there a way round it please?

Thank you very much.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29955
#5

01 May 2015, 16:06

You can't control how Stata makes this choice. But you can deprive it of the need to make a choice by simply omitting one of the variables from your call to -pcorr-. To force var_1 to be the reference category:

Code:

pcorr [other variables] var_2-var_[number of last category here]
Comment
Michelle S

Join Date: Oct 2014

Posts: 8
#6

03 May 2015, 09:33

Dear Dr Schechter,

Thanks so much for your reply.
When you said var_2-var_4, you don't mean var_2 MINUS var_4?
Would it be correct to type in pcorr [other variables] var_2 var_3 var_4 ?

Thanks again!
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#7

03 May 2015, 09:36

see help varlist for guidance on referring to sets of variables; Clyde does NOT mean "minus"; he means var_2 through var_4
Comment
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#8

18 Jan 2018, 06:06

Good afternoon,
Instead of starting a new topic, I though I wake this one up; you may consider it as a sign of a profound search for an answer before asking again the same question

(I think that) I understand the meaning of partial (semipartial) correlation coefficients (like mpg or weight in the example above) but I would like to know how to interpret the coefficients of the categorical variables, like those of rep78_2 or rep_78_3, for example. What does the value 0.0498 tell me? What is the meaning of its square?

A line of explanation would be great; thanks in advance!

Regards,
Piotr Lewczuk

PS It's great we can use factor variables now: . pcorr price mpg weight i.rep78
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29955
#9

18 Jan 2018, 08:55

I've never been a great fan of partial and semi-partial correlations. Basically they are estimates of what the correlation between y and x would be if the effects of the other variables on y and x (or, for semi-partial correlation, just on x) were somehow neutralized. (For example, what might be observed in a controlled experiment where other variables were held fixed.)

Personally, I don't find the squares of these coefficients very useful. I find correlation coefficients, unsquared, to be a more natural metric of association. In the context of linear regression, the square takes on additional importance because it tells you the proportion of variance explained by the model--but outside of that context I don't usually look at the squares of correlations. I suppose you could say that the squared partial correlation estimates the proportion of y variance that would be accounted for by x if all the other effects were held constant. But I don't find that information useful. YMMV.
Comment
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#10

19 Jan 2018, 07:07

Thank you, but this does not answer my question; I understand the meaning of a correlation coefficient between two continuous variables; I would somehow understand a coefficient of a correlation between a categorical variable with 5 categories (i.rep78) and a continuous variable (price), but I do not understand meaning of a coefficient of a correlation between one category of a categorical variable (2.rep_78) and a continuous variable. What kind of association (of what with what) does the correlation coefficient between rep78_2 and price describe?
Thanks in advance for commenting.
Regards,
P. Lewczuk
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29955
#11

19 Jan 2018, 08:47

Oh, I see you meant something different from what I thought.

So 2.rep78 is not a category of a variable. It is a 0/1 dichotomous variable that distinguishes category 2 of rep78 from all other categories of rep78. So this correlation coefficient means the same thing as any correlation coefficient involving a dichotomous variable. If the other variable were normally distributed, you could think of it as a rather obscure, indirect measure of the standardized mean difference in the outcome between rep78 = 2 and rep78 != 2 subsets of the data. In particular, if the correlation coefficient were 1 (or -1) it would mean that rep78 (considered as a dichotomy, 2 vs all other categories) completely separates the values of the other variable into non-overlapping distributions. If, on the other extreme, the coefficient were zero, it implies that the distribution of the other variable is the same whether rep78 = 2 or rep78 != 2.

BY the way, I don't think that you should somehow understand a coefficient of a correlation between a categorical variable with 5 categories and a continuous variable. Unless that categorical variable is actually ordinal (which rep78 is), such a correlation would be meaningless.
Comment
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#12

22 Jan 2018, 02:11

Thank you very much; your answer clarifies the issue and is extremely helpful!
Best regards,
P. Lewczuk
Comment

Announcement