Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Kendall Tau on panel data to test trends

    Hi everyone, I am looking for some help. My key question is: can I use the ktau command test for trends in panel data?


    I have government expenditure data for 23 countries over a period of 6 years for a 3 different of health services. I want to see what the trend is over time for each service area. I decided to use the Mann-Kendall Test using the Kendall Tau coefficient because it is a non-parametric test and I know that non-parametric tests are good when the median better reflects the distribution of the data than the mean (there is a huge outlier in this data set). However, I am wondering if this test can be used on panel data? I was able to run the command just fine but I am wondering if the results are useful.


    I used this command for each service area:

    ktau year servicearea


    Thanks!


  • #2
    What does servicearea represent? Countries? Health services? If you input data to share you might get more help.

    Comment


    • #3
      Hi, apologies for the lack of specification. I have three different service area variables that measure domestic government expenditure per capita on each service area per county per year.

      Domestic general expenditure per capita on HIV/AIDS: dgexpcapitahiv/aids
      Domestic general expenditure per capita on Family planning: ktau year degexpcapitafp
      Domestic general expenditure per capita on Maternal conditions: ktau year degexcapaitamc

      To test the trend for each service area, I used the ktau command for each service area. The results provided me with a kendall's tau coefficient, which when negative I read as a negative trend of government expenditure on the service area per year, and when positive, I read as a positive trend on service area per year.

      ktau year dgexpcapitahiv/aids
      ktau year degexpaitafp
      ktau year degexpaitamc

      Is this an appropriate test to use on panel data?

      Comment


      • #4
        It's the other way round. Which measure comes closest to your goals? Consider the Grunfeld data. An overall Kendall tau of any flavour mixes companies (panels) with -- typically -- quite different sizes. The overall tau machinery takes no account whatsoever of the panel structure. Other way round, you can gauge the individual trends by looping over the panels.


        Code:
        webuse grunfeld, clear
        
        scatter invest year
        
        gen tau_b = .
        
        quietly forval c = 1/10 {
             ktau invest year if company == `c'
             replace tau_b = r(tau_b) if company == `c'
        }
        
        
         tabdisp company, c(tau_b)
        
        ----------------------
          Company |      tau_b
        ----------+-----------
                1 |   .8210526
                2 |   .4696586
                3 |   .6631579
                4 |   .6315789
                5 |   .5473684
                6 |   .8736842
                7 |   .7789474
                8 |   .6210526
                9 |   .6421053
               10 |   .4421053
        ----------------------
        
        ktau invest year
        
          Number of obs =     200
        Kendall's tau-a =       0.2607
        Kendall's tau-b =       0.2668
        Kendall's score =    5187
            SE of score =     944.984   (corrected for ties)
        
        Test of H0: invest and year are independent
             Prob > |z| =       0.0000  (continuity corrected)
        .
        Note how the overall tau_b at 0.27 not only is quite different from the mean or median (say) of the individual tau_b, but also is not even within the range of the individual tau_b. They measure quite different things, as contemplation of scatter plots alone should make clear.

        Researchers vary a great deal in which how much emphasis they put on this kind of exercise. In a fuller exercise in which I was under orders to use Kendall tau, I would still round to 2 or 3 decimal places.

        Comment


        • #5
          Hi Nick, Thanks so much for taking the time to answer my question. Based off of your answer, I think I need to choose a different test to test the trends.

          My main goal is to understand the expenditure per capita trends over time (2014-2019) of these individual three service areas (HIV/AIDS, family planning, and maternal conditions) -- so, is there a positive or a negative trend across countries over time for each service area and if this trend statistically significant, so I need to figure out an appropriate way to execute this. Ideally I would like to use a non-parametric test but could perhaps use a parametric test and simply remove the massive outlier country. The test would also need to assume random effects.

          If anyone has suggestions, I would be very thankful. I am having trouble finding an appropriate method.

          Comment


          • #6
            Originally posted by Florence Abbe View Post
            I have government expenditure data for 23 countries over a period of 6 years for a 3 different of health services. I want to see what the trend is over time for each service area.
            Why not fit a regression model to the data and examine the first (linear) component of the set of orthogonal polynomial contrasts? That will give you the linear trend over time.

            . . . the median better reflects the distribution of the data than the mean (there is a huge outlier in this data set).
            I'm not sure that a nonparametric tests is a panacea for an outlier, but if you want to model a distribution of the data, you can use a generalized linear model with a distribution family and link function.

            You might be better with a model fitted to all three expenditures at once, as well as including a random effect for country. Maybe something along the following lines.
            Code:
            rename dgexpcapitahiv/aids /* ?! */ hiv
            rename degexpaitafp fam
            rename degexpaitamc mat
            rename year tim
            
            // Assumes cid is variable for Country ID
            
            gsem ///
                (hiv <- i.tim M[cid]) ///
                (fam <- i.tim M[cid]) ///
                (mat <- i.tim M[cid]), family(gaussian) link(log) ///
                    nocnsreport nodvheader nolog
            
            // Then examine the first component--labeled "(linear)"--of each of the following:
            contrast p.tim, equation(hiv)
            contrast p.tim, equation(fam)
            contrast p.tim, equation(mat)
            With only 23 countries, you'd need to be cautious about overinterpretation.

            Comment


            • #7
              I'm not sure what the significance test is suppose to be capturing, but you could consider using a mixed effects regression to test the relationship between year and service area: https://towardsdatascience.com/using...on-7b7941d249b

              Comment


              • #8
                Originally posted by Joseph Coveney View Post
                Why not fit a regression model to the data and examine the first (linear) component of the set of orthogonal polynomial contrasts? That will give you the linear trend over time.

                I'm not sure that a nonparametric tests is a panacea for an outlier, but if you want to model a distribution of the data, you can use a generalized linear model with a distribution family and link function.

                You might be better with a model fitted to all three expenditures at once, as well as including a random effect for country. Maybe something along the following lines.
                Code:
                rename dgexpcapitahiv/aids /* ?! */ hiv
                rename degexpaitafp fam
                rename degexpaitamc mat
                rename year tim
                
                // Assumes cid is variable for Country ID
                
                gsem ///
                (hiv <- i.tim M[cid]) ///
                (fam <- i.tim M[cid]) ///
                (mat <- i.tim M[cid]), family(gaussian) link(log) ///
                nocnsreport nodvheader nolog
                
                // Then examine the first component--labeled "(linear)"--of each of the following:
                contrast p.tim, equation(hiv)
                contrast p.tim, equation(fam)
                contrast p.tim, equation(mat)
                With only 23 countries, you'd need to be cautious about overinterpretation.

                Thanks a lot for this suggestion, Joseph. Unfortunately, when I run the gsem command, I get an error r(430) cannot compute an improvement -- discontinuous region encountered. My panel data is unbalanced, could this be why?

                Comment


                • #9
                  One country being an outlier is hardly an issue if you look at countries separately — which I guess makes more sense than lumping countries together.

                  Comment


                  • #10
                    Originally posted by Florence Abbe View Post
                    My panel data is unbalanced, could this be why?
                    Could be. Maybe try adding constraints to reduce the ways in which it can wander off, and simplifying the link function.
                    Code:
                    gsem ///
                        (hiv <- i.tim M[cid]@1) ///
                        (fam <- i.tim M[cid]@1) ///
                        (mat <- i.tim M[cid]@1), family(gaussian) link(identity) ///
                            nocnsreport nodvheader nolog
                    You can also try fitting separate mixed-effects linear regression to each individually (like what Tom, I think, is suggesting), or stacking them in a single linear mixed model. There are various alternatives to try.

                    By the way, when you say "huge outlier", do you mean that the direction of the trend is reversed for one country? I inferred that only the magnitude differed and the trend followed that of the other 22 countries. Depending upon which it is, Nick might be right in that pooling qualitatively disparate circumstances might not make sense.

                    Comment


                    • #11
                      Thank you all for your input. Your responses were extremely helpful! I think I will opt for fitting separate mixed-effects linear regressions to each service area individually.

                      Thanks, Joseph, adding the constraints worked well.

                      Originally posted by Joseph Coveney View Post
                      Could be. Maybe try adding constraints to reduce the ways in which it can wander off, and simplifying the link function.
                      Code:
                      gsem ///
                      (hiv <- i.tim M[cid]@1) ///
                      (fam <- i.tim M[cid]@1) ///
                      (mat <- i.tim M[cid]@1), family(gaussian) link(identity) ///
                      nocnsreport nodvheader nolog
                      You can also try fitting separate mixed-effects linear regression to each individually (like what Tom, I think, is suggesting), or stacking them in a single linear mixed model. There are various alternatives to try.

                      By the way, when you say "huge outlier", do you mean that the direction of the trend is reversed for one country? I inferred that only the magnitude differed and the trend followed that of the other 22 countries. Depending upon which it is, Nick might be right in that pooling qualitatively disparate circumstances might not make sense.

                      I am however still curious of how to interpret the gsem results. For example the results for HIV were as follows. How should the linear coefficient be interpreted?


                      Contrasts of marginal linear predictions

                      Margins : asbalanced

                      ------------------------------------------------
                      | df chi2 P>chi2
                      -------------+----------------------------------
                      hiv |
                      year |
                      (linear) | 1 0.70 0.4035
                      (quadratic) | 1 0.12 0.7241
                      (cubic) | 1 0.01 0.9093
                      (quartic) | 1 0.32 0.5700
                      (quintic) | 1 0.12 0.7258
                      Joint | 5 1.12 0.9520
                      ------------------------------------------------

                      --------------------------------------------------------------
                      | Contrast Std. Err. [95% Conf. Interval]
                      -------------+------------------------------------------------
                      hiv |
                      year |
                      (linear) | -2.202085 2.636144 -7.368832 2.964663
                      (quadratic) | .9034108 2.558936 -4.112011 5.918833
                      (cubic) | .2903802 2.54868 -4.70494 5.2857
                      (quartic) | -1.422361 2.503738 -6.329598 3.484875
                      (quintic) | .8485665 2.419209 -3.892996 5.590128
                      --------------------------------------------------------------

                      Thanks
                      Last edited by Florence Abbe; 20 Feb 2022, 17:06.

                      Comment


                      • #12
                        Originally posted by Florence Abbe View Post
                        I am however still curious of how to interpret the gsem results. For example the results for HIV were as follows. How should the linear coefficient be interpreted?
                        The interpretation is that you don't have enough precision (coefficient ± its standard error: -2 ± 3) to draw any conclusion with confidence, not even about its sign inasmuch as both positive and negative values are consistent with the results (95% C.I.: [-7, +3]).

                        Comment

                        Working...
                        X