Using Kendall Tau on panel data to test trends

Florence Abbe

Join Date: Feb 2022

Posts: 6
#1

Using Kendall Tau on panel data to test trends

19 Feb 2022, 11:31

Hi everyone, I am looking for some help. My key question is: can I use the ktau command test for trends in panel data?

I have government expenditure data for 23 countries over a period of 6 years for a 3 different of health services. I want to see what the trend is over time for each service area. I decided to use the Mann-Kendall Test using the Kendall Tau coefficient because it is a non-parametric test and I know that non-parametric tests are good when the median better reflects the distribution of the data than the mean (there is a huge outlier in this data set). However, I am wondering if this test can be used on panel data? I was able to run the command just fine but I am wondering if the results are useful.

I used this command for each service area:

ktau year servicearea

Thanks!
Tags: None
Tom Scott

Join Date: Apr 2019

Posts: 266
#2

19 Feb 2022, 19:59

What does servicearea represent? Countries? Health services? If you input data to share you might get more help.
Comment
Florence Abbe

Join Date: Feb 2022

Posts: 6
#3

20 Feb 2022, 02:47

Hi, apologies for the lack of specification. I have three different service area variables that measure domestic government expenditure per capita on each service area per county per year.

Domestic general expenditure per capita on HIV/AIDS: dgexpcapitahiv/aids
Domestic general expenditure per capita on Family planning: ktau year degexpcapitafp
Domestic general expenditure per capita on Maternal conditions: ktau year degexcapaitamc

To test the trend for each service area, I used the ktau command for each service area. The results provided me with a kendall's tau coefficient, which when negative I read as a negative trend of government expenditure on the service area per year, and when positive, I read as a positive trend on service area per year.

ktau year dgexpcapitahiv/aids
ktau year degexpaitafp
ktau year degexpaitamc

Is this an appropriate test to use on panel data?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#4

20 Feb 2022, 03:04

It's the other way round. Which measure comes closest to your goals? Consider the Grunfeld data. An overall Kendall tau of any flavour mixes companies (panels) with -- typically -- quite different sizes. The overall tau machinery takes no account whatsoever of the panel structure. Other way round, you can gauge the individual trends by looping over the panels.

Code:

webuse grunfeld, clear scatter invest year gen tau_b = . quietly forval c = 1/10 { ktau invest year if company == `c' replace tau_b = r(tau_b) if company == `c' } tabdisp company, c(tau_b) ---------------------- Company | tau_b ----------+----------- 1 | .8210526 2 | .4696586 3 | .6631579 4 | .6315789 5 | .5473684 6 | .8736842 7 | .7789474 8 | .6210526 9 | .6421053 10 | .4421053 ---------------------- ktau invest year Number of obs = 200 Kendall's tau-a = 0.2607 Kendall's tau-b = 0.2668 Kendall's score = 5187 SE of score = 944.984 (corrected for ties) Test of H0: invest and year are independent Prob > |z| = 0.0000 (continuity corrected)

.
Note how the overall tau_b at 0.27 not only is quite different from the mean or median (say) of the individual tau_b, but also is not even within the range of the individual tau_b. They measure quite different things, as contemplation of scatter plots alone should make clear.

Researchers vary a great deal in which how much emphasis they put on this kind of exercise. In a fuller exercise in which I was under orders to use Kendall tau, I would still round to 2 or 3 decimal places.
1 like
Comment
Florence Abbe

Join Date: Feb 2022

Posts: 6
#5

20 Feb 2022, 05:36

Hi Nick, Thanks so much for taking the time to answer my question. Based off of your answer, I think I need to choose a different test to test the trends.

My main goal is to understand the expenditure per capita trends over time (2014-2019) of these individual three service areas (HIV/AIDS, family planning, and maternal conditions) -- so, is there a positive or a negative trend across countries over time for each service area and if this trend statistically significant, so I need to figure out an appropriate way to execute this. Ideally I would like to use a non-parametric test but could perhaps use a parametric test and simply remove the massive outlier country. The test would also need to assume random effects.

If anyone has suggestions, I would be very thankful. I am having trouble finding an appropriate method.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#6

20 Feb 2022, 05:58

Originally posted by Florence Abbe View Post

I have government expenditure data for 23 countries over a period of 6 years for a 3 different of health services. I want to see what the trend is over time for each service area.

Why not fit a regression model to the data and examine the first (linear) component of the set of orthogonal polynomial contrasts? That will give you the linear trend over time.

. . . the median better reflects the distribution of the data than the mean (there is a huge outlier in this data set).

I'm not sure that a nonparametric tests is a panacea for an outlier, but if you want to model a distribution of the data, you can use a generalized linear model with a distribution family and link function.

You might be better with a model fitted to all three expenditures at once, as well as including a random effect for country. Maybe something along the following lines.

Code:

rename dgexpcapitahiv/aids /* ?! */ hiv rename degexpaitafp fam rename degexpaitamc mat rename year tim // Assumes cid is variable for Country ID gsem /// (hiv <- i.tim M[cid]) /// (fam <- i.tim M[cid]) /// (mat <- i.tim M[cid]), family(gaussian) link(log) /// nocnsreport nodvheader nolog // Then examine the first component--labeled "(linear)"--of each of the following: contrast p.tim, equation(hiv) contrast p.tim, equation(fam) contrast p.tim, equation(mat)

With only 23 countries, you'd need to be cautious about overinterpretation.
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#7

20 Feb 2022, 06:20

I'm not sure what the significance test is suppose to be capturing, but you could consider using a mixed effects regression to test the relationship between year and service area: https://towardsdatascience.com/using...on-7b7941d249b
1 like
Comment
Florence Abbe

Join Date: Feb 2022

Posts: 6
#8

20 Feb 2022, 06:45

Originally posted by Joseph Coveney View Post

Why not fit a regression model to the data and examine the first (linear) component of the set of orthogonal polynomial contrasts? That will give you the linear trend over time.

I'm not sure that a nonparametric tests is a panacea for an outlier, but if you want to model a distribution of the data, you can use a generalized linear model with a distribution family and link function.

You might be better with a model fitted to all three expenditures at once, as well as including a random effect for country. Maybe something along the following lines.

Code:

rename dgexpcapitahiv/aids /* ?! */ hiv rename degexpaitafp fam rename degexpaitamc mat rename year tim // Assumes cid is variable for Country ID gsem /// (hiv <- i.tim M[cid]) /// (fam <- i.tim M[cid]) /// (mat <- i.tim M[cid]), family(gaussian) link(log) /// nocnsreport nodvheader nolog // Then examine the first component--labeled "(linear)"--of each of the following: contrast p.tim, equation(hiv) contrast p.tim, equation(fam) contrast p.tim, equation(mat)

With only 23 countries, you'd need to be cautious about overinterpretation.

Thanks a lot for this suggestion, Joseph. Unfortunately, when I run the gsem command, I get an error r(430) cannot compute an improvement -- discontinuous region encountered. My panel data is unbalanced, could this be why?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#9

20 Feb 2022, 06:53

One country being an outlier is hardly an issue if you look at countries separately — which I guess makes more sense than lumping countries together.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#10

20 Feb 2022, 07:03

Originally posted by Florence Abbe View Post

My panel data is unbalanced, could this be why?

Could be. Maybe try adding constraints to reduce the ways in which it can wander off, and simplifying the link function.

Code:

gsem /// (hiv <- i.tim M[cid]@1) /// (fam <- i.tim M[cid]@1) /// (mat <- i.tim M[cid]@1), family(gaussian) link(identity) /// nocnsreport nodvheader nolog

You can also try fitting separate mixed-effects linear regression to each individually (like what Tom, I think, is suggesting), or stacking them in a single linear mixed model. There are various alternatives to try.

By the way, when you say "huge outlier", do you mean that the direction of the trend is reversed for one country? I inferred that only the magnitude differed and the trend followed that of the other 22 countries. Depending upon which it is, Nick might be right in that pooling qualitatively disparate circumstances might not make sense.
Comment
Florence Abbe

Join Date: Feb 2022

Posts: 6
#11

20 Feb 2022, 15:56

Thank you all for your input. Your responses were extremely helpful! I think I will opt for fitting separate mixed-effects linear regressions to each service area individually.

Thanks, Joseph, adding the constraints worked well.

Originally posted by Joseph Coveney View Post

Could be. Maybe try adding constraints to reduce the ways in which it can wander off, and simplifying the link function.

Code:

gsem /// (hiv <- i.tim M[cid]@1) /// (fam <- i.tim M[cid]@1) /// (mat <- i.tim M[cid]@1), family(gaussian) link(identity) /// nocnsreport nodvheader nolog

You can also try fitting separate mixed-effects linear regression to each individually (like what Tom, I think, is suggesting), or stacking them in a single linear mixed model. There are various alternatives to try.

By the way, when you say "huge outlier", do you mean that the direction of the trend is reversed for one country? I inferred that only the magnitude differed and the trend followed that of the other 22 countries. Depending upon which it is, Nick might be right in that pooling qualitatively disparate circumstances might not make sense.

I am however still curious of how to interpret the gsem results. For example the results for HIV were as follows. How should the linear coefficient be interpreted?

Contrasts of marginal linear predictions

Margins : asbalanced

------------------------------------------------
| df chi2 P>chi2
-------------+----------------------------------
hiv |
year |
(linear) | 1 0.70 0.4035
(quadratic) | 1 0.12 0.7241
(cubic) | 1 0.01 0.9093
(quartic) | 1 0.32 0.5700
(quintic) | 1 0.12 0.7258
Joint | 5 1.12 0.9520
------------------------------------------------

--------------------------------------------------------------
| Contrast Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
hiv |
year |
(linear) | -2.202085 2.636144 -7.368832 2.964663
(quadratic) | .9034108 2.558936 -4.112011 5.918833
(cubic) | .2903802 2.54868 -4.70494 5.2857
(quartic) | -1.422361 2.503738 -6.329598 3.484875
(quintic) | .8485665 2.419209 -3.892996 5.590128
--------------------------------------------------------------

Thanks

Last edited by Florence Abbe; 20 Feb 2022, 16:06.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#12

20 Feb 2022, 19:19

Originally posted by Florence Abbe View Post

I am however still curious of how to interpret the gsem results. For example the results for HIV were as follows. How should the linear coefficient be interpreted?

The interpretation is that you don't have enough precision (coefficient ± its standard error: -2 ± 3) to draw any conclusion with confidence, not even about its sign inasmuch as both positive and negative values are consistent with the results (95% C.I.: [-7, +3]).
1 like
Comment

Announcement

Using Kendall Tau on panel data to test trends

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment