Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Friedman's test code not working

    Hi,

    I ran the code for a Friedman's test comparing multiple variables: https://www.stata.com/statalist/arch.../msg00691.html

    However I cannot get it to work (while I did save my file before running the code). How can I get this code to work?I have a panel data set and I want to compare 11 interaction variables that I have (dummy x independent variable). Or if this is not possible, I would like to use the group variable which is used for creating the 11 dummies and then check differences between these groups. When I try to run the code on the interaction variables, it says: / invalid name.

    How could I get this code to work? It seems to be exactly what I want, as I want to compare all groups individually (group 1 with group 2, group 1 with 3 etc.) but I cannot get it to work.

    Thanks in advance.


  • #2
    An error like "/ invalid name" probably comes about because your version of the program copied off of Statalist has errors in the copy (like extra characters, missing line breaks, etc).

    I recommend you type
    Code:
    set trace on
    then re-run your code, and post the output here. That will tell you where Stata thinks the invalid name is appearing.

    Comment


    • #3
      Hello Lisanne. I am always reminded of this blog post by Thom Baguley when I see people using Friedman's test. Perhaps you'll find it interesting.
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        If you're looking to perform a straightforward Friedman's test, then the user-written command -emh- (SSC) might be easier to work with.

        Below, I show a worked example for illustration that matches the results for for SAS's PROC FREQ; . . . / cmh; for which -emh- is a direct analogue.
        Code:
        version 16.0 // substitute yours here
        
        clear *
        
        input byte Subject str8 Emotion double SkinResponse
           1 fear 23.1
           1 joy 22.7
           1 sadness 22.5
           1 calmness 22.6
           2 fear 57.6
           2 joy 53.2
           2 sadness 53.7
           2 calmness 53.1
           3 fear 10.5
           3 joy  9.7
           3 sadness 10.8
           3 calmness  8.3
           4 fear 23.6
           4 joy 19.6
           4 sadness 21.1
           4 calmness 21.6
           5 fear 11.9
           5 joy 13.8
           5 sadness 13.7
           5 calmness 13.3
           6 fear 54.6
           6 joy 47.1
           6 sadness 39.2
           6 calmness 37.0
           7 fear 21.0
           7 joy 13.6
           7 sadness 13.7
           7 calmness 14.8
           8 fear 20.3
           8 joy 23.6
           8 sadness 16.3
           8 calmness 14.8
        end
        
        label define Emotions 1 fear 2 joy 3 sadness 4 calmness
        encode Emotion, generate(emotion) label(Emotions) noextend
        
        emh SkinResponse emotion, anova strata(Subject) transformation(rank)
        
        exit
        Bruce's reference mentions the relatively low power of the test when the number of groups (exposure conditions) is few. In the book reference given in the help file for -emh-, the use of aligned-rank transformation in conjunction with Friedman's test is recommended. (For verification & validation of -emh-, I replicated the examples in that book, including the aligned-rank.)

        I want to compare all groups individually (group 1 with group 2, group 1 with 3 etc.)
        You may actually be looking for -signrank-, which is official Stata.

        Comment


        • #5
          emh is a nice-looking command from Joseph Coveney.

          Changing the subject slightly -- but I hope in a way that is of some interest -- I took the example data used by Joseph, which seem fairly typical of the datasets I see in books on nonparametric statistics of the older kind.

          Here I use stripplot from SSC to see what is going on.

          My go-to choice -- at least for this number of observations and this number of categories -- is a quantile-box plot. That is

          * all the data in order plotted against their ranks, so a quantile plot (compare quantile or more flexible qplot from theStata Journal)

          * a conventional box (surprise) showing median and quartiles.

          Thus half the data lie inside each box -- and half outside, often the more important half, at least for deciding where to go next.

          The conventional 1.5 IQR rule for deciding which data points are plotted individually had its origins in Tukey's focus on plots you draw yourself, and want to draw quickly, but now we have computers and don't need to economise in that way.


          Code:
          stripplot Skin, over(Emotion) vertical box refline cumul center xla(, noticks) name(g1)
          Click image for larger version

Name:	skin_emotion1.png
Views:	1
Size:	27.9 KB
ID:	1508578



          The refline by default shows the mean. Some people want to add means to boxplots -- as I do here -- but usually want to do that with a point or marker symbol, which does not fit so well with this design.

          If you want to downplay the boxes, just tune the line widths and/or colours.

          There is nothing pathological about these data: it is just that they are markedly skewed. I would reach for logarithms here as a first choice of transformed scale. Now the natural reference to me is geometric mean, although you need to have egen function code for geometric mean for this to work (gmean is one of the functions in egenmore from SSC, although writing your own function is a beginner programmer task):

          Code:
          stripplot Skin, over(Emotion) vertical box refline reflevel(gmean) cumul center ysc(log) xla(, noticks) name(g2)
          Click image for larger version

Name:	skin_emotion2.png
Views:	1
Size:	28.3 KB
ID:	1508579



          I feel comfortable on that scale. If this dataset were mine and I cared about doing a full analysis, I would switch to glm, link(log) but I stop there. As it happens, if there is structure in the data here, the sample sizes are too small to say much about it, which perhaps is the main message of nonparametric tests too.

          The slogans from this post in essence

          1. If I hear the word "test", I reach for my graph commands.

          2. Are you sure you need nonparametric tests and are not just a fan of 1950s retro statistics or over-influenced by that course you did?

          Comment


          • #6
            Hi all, thanks for the input!

            Nick Cox
            I never had any course about these non-parametric tests that's why I am not entirely sure about Friedman, I am a Msc Finance student and only had courses involving OLS so it's not completely my cup of tea but I think OLS is too sensitive to outliers to be able to use in my research (and I am especially interested in outliers so don't want to throw them away). Therefore I am trying to divert from parametric tests. I will have a look at the boxplot code you sent, it seems very useful for my research indeed.

            I found out that a Friedman's test is indeed perhaps not the right way to go as I get a p-value of 1.000. The cause is the following: I am trying to compare two interaction variables which consist of a dummy ext for group i (i=[1,11]) multiplied by the independent variable. Let's say my dependent variable is Y (log(1+weekly return)) and I have dummy1-dummy11. The dummies consists of different groups of changes in news risk, based on both the change in the amount of articles and the change in news score (compared to the average per company to ensure comparability across companies). Dummy1 (ext1) represents the group with the highest absolute changes in news risk (the highest decreases/increases in news risk) and dummy11 the group with the lowest absolute changes in news risk (closest to 0).

            My independent variable (X) is the change in news risk from week t-1 till week t. The dependent variable is taken for week t. Furthermore I am also regressing the lagged versions of X and the lagged dummies on Y to estimate a causal effect.

            I am performing a quantile panel regression as I am very interested in the outliers as well (as they often have the biggest effect on the returns) but do want a test that is able to deal with these outliers. My panel variable is the company and the time variable is the week. My regression looks something like this:

            Y= beta * dummy_{it} (summed over i=1 till i=11)+ beta * dummy_{it} (summed over i=1 till i=11) * X_{it} + beta * X_{it}
            with i=[1,11]

            Dummy_it is equal to 1 if the group is equal to i and 0 otherwise. The bold refers to the interaction variable that I am very interested in.

            One observation represents a certain group at a certain point in time.

            What I am interested in is to see if the differences in interaction effects is significant, so for example the difference between:

            beta * dummy1 * X
            beta * dummy2 * X

            and so on, I want to compare all these interaction variables to each other for dummy1-dummy11.

            While the interaction effects are often significantly different from zero, my professor wanted me to test whether the difference is significant. However I cannot reach him as he is on holiday. He proposed a T test but I guess that it cannot be applied to my sample because of the dummies that are present in the interaction (observations are not overlapping in time, one observation at a point in time is only assigned to one group).Please correct me if I am wrong! I find it hard to find the right test.

            Both my log weekly returns and the change in news risk are not normally distributed (log weekly returns is a bit bimodal but has overlap and change in news risk is highly peaked), however I have ~144,000 observations (referring to approximately 1050 publicly traded companies across 152 weeks). As you can see in the table below there is a Simpson's paradox present and the positive interaction effect seems to be higher for groups with a higher absolute change in news risk. However I would like to test if this difference is significant (if possible). My question is, is this even possible?

            Ext_i refers to the dummies in the table. Please note that I eliminated the dummies without the interaction from these tables just for comparison reason in this case, I actually used the dummies without the interaction as well in the regression but this way it is easier to see the simpson's paradox. Ext11 is omitted due to collinearity.
            Click image for larger version

Name:	tableregression.JPG
Views:	1
Size:	59.2 KB
ID:	1508826

            Last edited by Lisanne Stolte; 22 Jul 2019, 06:13.

            Comment

            Working...
            X