Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to test if the mean of a sample is significantly different from zero

    Hi everyone,

    I have a sample of 166 bonds (i.e. n=166). I have broken these down into subsamples based on some characteristics of the bonds, ending up with 8 subsamples (these are not mutually exclusive) based on three dummy variables.

    For each of these subsamples, I want to test if the dependent variable (Yield) is significantly different from zero. I have run Shapiro Wilks and for 6 of the 8 subsamples I reject the normality hypothesis. From my understanding the Wilcoxon rank sum/Mann Whitney U-test is only used to test if the means from two samples are significantly different from each other. However, I only have one sample (i.e. each subsample), and I want to test if that subsample's mean is significantly different form zero. How do I go about doing this? For the two subsamples where I don't reject normality, should I run a one sample t test or should I use a non-parametric test for all subsamples?

    Best,

    Nils

  • #2
    Nils:
    do you mean something along the lines of the following toy-example?
    Code:
    sysuse auto.dta
    . by foreign, sort : ttest price == 0
    
    ---------------------------------------------------------------------------------------------------------------------
    -> foreign = Domestic
    
    One-sample t test
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
       price |      52    6072.423    429.4911    3097.104    5210.184    6934.662
    ------------------------------------------------------------------------------
        mean = mean(price)                                            t =  14.1386
    Ho: mean = 0                                     degrees of freedom =       51
    
        Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
     Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000
    
    ---------------------------------------------------------------------------------------------------------------------
    -> foreign = Foreign
    
    One-sample t test
    ------------------------------------------------------------------------------
    Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
       price |      22    6384.682    558.9942    2621.915     5222.19    7547.174
    ------------------------------------------------------------------------------
        mean = mean(price)                                            t =  11.4217
    Ho: mean = 0                                     degrees of freedom =       21
    
        Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
     Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Carlo Lazzaro
      Yes, that is along the lines of the outcome I'm trying to achieve. The value that we're testing the mean against is arbitrary I suppose, as the mean in your toy-example obviously differ quite significantly from zero. For my data, the dependent variable that I'm testing is clustered around zero which is why that is what I am testing the mean against.

      I see that you performed a t-test. It is my understanding that the t-test should not be used if the sample is not normally distributed. As shown by the Shapiro Wilks test, this is not the case for 6 out of 8 of the subsamples for which I am looking to perform the test. What non-parametric test should I use instead, if I'm trying to achieve the same results that the test you showed did?

      Comment


      • #4
        Nils:
        1) you're correct about the theoretical foundations of -ttest-. That said, it can usually allows (aka is robust) departures from normality. You cal also take a look at -bootstrap- entry and related examples in Stata .pdf manual;
        2) see -help ranksum- for a non-parametric alternative.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Carlo Lazzaro
          Thank you. I actually read through -help ranksum- before making this post. However, I interpreted it as saying that the Wilcoxon rank-sum test requires two independent samples. As I only have a single sample, is this still the right test? I am not comparing two samples to each other, rather I understand it as testing whether the sample mean is indicative for the population mean. The t test in your example seems to be a one sample test whereas the Wilcoxon rank-sum test is not. Am I wrong in thinking so?
          Last edited by Nils Edgren; 03 May 2019, 05:35.

          Comment


          • #6
            Nils:
            with one sample only a bit of additional work is required if you want to use -ranksum-.
            What follows may be an approach:
            Code:
            . set obs 20
            number of observations (_N) was 0, now 20
            
            . g A=runiform()*100 in 1/10
            (10 missing values generated)
            
            . replace A=0 if A==.
            (10 real changes made)
            
            . g group=1 in 1/10
            (10 missing values generated)
            
            . replace group=0 in 11/20
            (10 real changes made)
            
            . ttest A, by(group) unequal
            
            Two-sample t test with unequal variances
            ------------------------------------------------------------------------------
               Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
            ---------+--------------------------------------------------------------------
                   0 |      10           0           0           0           0           0
                   1 |      10    38.26316    9.496341    30.03007    16.78094    59.74538
            ---------+--------------------------------------------------------------------
            combined |      20    19.13158    6.373587    28.50355    5.791509    32.47165
            ---------+--------------------------------------------------------------------
                diff |           -38.26316    9.496341               -59.74538   -16.78094
            ------------------------------------------------------------------------------
                diff = mean(0) - mean(1)                                      t =  -4.0293
            Ho: diff = 0                     Satterthwaite's degrees of freedom =        9
            
                Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
             Pr(T < t) = 0.0015         Pr(|T| > |t|) = 0.0030          Pr(T > t) = 0.9985
            
            . ranksum A , by( group )
            
            Two-sample Wilcoxon rank-sum (Mann-Whitney) test
            
                   group |      obs    rank sum    expected
            -------------+---------------------------------
                       0 |       10          55         105
                       1 |       10         155         105
            -------------+---------------------------------
                combined |       20         210         210
            
            unadjusted variance      175.00
            adjustment for ties      -21.71
                                 ----------
            adjusted variance        153.29
            
            Ho: A(group==0) = A(group==1)
                         z =  -4.038
                Prob > |z| =   0.0001
            
            .
            That said, I would go -ttest- with unequal variance.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Thank you! Carlo Lazzaro

              Comment


              • #8
                Carlo Lazzaro
                So I performed the test using the approach you outlined above with the following result:
                Code:
                ttest GREENPREMIUM,by(group) unequal
                1 group found, 2 required
                r(420);
                
                . replace GREENPREMIUM=0 if GREENPREMIUM==.
                (52 real changes made)
                
                . ttest GREENPREMIUM, by(group) unequal
                
                Two-sample t test with unequal variances
                ------------------------------------------------------------------------------
                   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
                ---------+--------------------------------------------------------------------
                       0 |      52           0           0           0           0           0
                       1 |      52   -.0108147    .0121887    .0878939   -.0352845    .0136551
                ---------+--------------------------------------------------------------------
                combined |     104   -.0054074     .006088    .0620862   -.0174816    .0066668
                ---------+--------------------------------------------------------------------
                    diff |            .0108147    .0121887               -.0136551    .0352845
                ------------------------------------------------------------------------------
                    diff = mean(0) - mean(1)                                      t =   0.8873
                Ho: diff = 0                     Satterthwaite's degrees of freedom =       51
                
                    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                 Pr(T < t) = 0.8105         Pr(|T| > |t|) = 0.3791          Pr(T > t) = 0.1895
                
                . ranksum GREENPREMIUM,by(group)
                
                Two-sample Wilcoxon rank-sum (Mann-Whitney) test
                
                       group |      obs    rank sum    expected
                -------------+---------------------------------
                           0 |       52        3094        2730
                           1 |       52        2366        2730
                -------------+---------------------------------
                    combined |      104        5460        5460
                
                unadjusted variance    23660.00
                adjustment for ties    -2956.68
                                     ----------
                adjusted variance      20703.32
                
                Ho: GREENP~M(group==0) = GREENP~M(group==1)
                             z =   2.530
                    Prob > |z| =   0.0114
                To clarify, it seems as if the ranksum test is telling me that the mean for the groups is not equal with statistical significance, i.e. the mean is significantly different from zero. However, the ttest is saying that we cannot conclude that the mean is significantly different from zero with statistical significance (as the p value is 0.3791). How should I interpret these contrasting results? Also, could you expand on why you would go ttest with unequal variance rather than ranksum?

                Comment


                • #9
                  Nils:
                  the issue with -ranksum- is that you should have two samples to make a valid comparison (and the 0 mean sample is just an artifact).
                  Conversely, -ttest- explicitly includes a valid one sample option.
                  That said, a secon thought would be to return to the example reported in #2, being confident that -ttest-machinery is (asymptotically) robust to departures from the normality assumption.
                  Last edited by Carlo Lazzaro; 03 May 2019, 10:02.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Carlo Lazzaro
                    Would you then say that running a one sample -ttest- yields a more "fair" or better result than -ranksum- with the 0 mean sample (I understand that it is just an artefact to allow us to perform the test)? Both versions of the t-test yield the same p-values. I ask this because the -ranksum- test always contradicts the ttests, i.e. when the ranksum test shows the mean is significantly different from zero, the ttests do not with statistical significance. And vice versa
                    Last edited by Nils Edgren; 03 May 2019, 10:19.

                    Comment


                    • #11
                      Nils:
                      yes, I would go one sample -ttest- for the reasons reported in my previous reply.
                      As per its name, -ranksum- ranks the observations and a fictitious sample with all 0 observations plays a role in -ranksum- machinery: this may be the reason of the conflicting results between (asymptotically correct) one sampe -ttest- and -ranksum- (that needs a fictitious sample to work).
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Carlo Lazzaro Thank you again for your help!

                        Comment


                        • #13
                          Hi everyone,

                          I have a question that is also related to this thread topic.

                          I have a dependent variable that measures earnings management practices of companies (Earnings Management - EM). There are various measurement methods for this, and for argument's sake EM as the dependent variable is sometimes used as an absolute value (EM occurs at all - regardless of the direction) or as a signed value (e.g. EM that occurs only in a positive direction).

                          Apart from the purely argumentative approach used by some authors, "Klein (2002) Audit committee, board of director characteristics, and earnings management (Journal of Accounting and Economics) p. 383 f. "(see Figure, line 1 in this picture here) tested what percentage of obs. of that Abnormal accruals (AAC) Variable are positive (AAC as a measure of earnings management) and tested in a t-test whether the mean of this measure is sig. different from 0. By not rejecting this hypothesis (i.e., no earnings management was found to be sig. in a positive direction), absolute values of this measure were used as the dependent variable for further regression analysis.

                          Click image for larger version

Name:	t-test Accruals.png
Views:	1
Size:	207.9 KB
ID:	1656377


                          Question:
                          Does anyone know (a) how to display the proportion of positive (or negative) values in summary statistics like it appears here (I have not found this possibility in e.g. tabstat so far) and (b) how to implement this t-test in this way is has been done there?

                          Many thanks in advance

                          Comment


                          • #14
                            Pete:
                            have you already taken a look at Advanced table customization , -table- entry, Stata .pdf manual?
                            Kind regards,
                            Carlo
                            (StataNow 18.5)

                            Comment


                            • #15
                              Carlo,

                              thanks for your response and the kind of obvious hint, where web searching sometimes make you lose sight of the really detailed manual.

                              Somehow i haven`t made it yet. I found various options for percentages of e.g. indicators by groups etc.; But so far no option, that would enable to display the percentage of only positive values of a continuous variable, that ranges from about - 5 to + 5 with that additional t - statistic option.

                              So any further advice is welcome.

                              Comment

                              Working...
                              X