Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • t test for matched-pair sample

    Hi all,

    This is probably a very basis question, but I'm still quite new to Stata.
    I am trying to replicate a paper which investigates differences between fraud firms and their non-fraud competitors. I have constructed a sample which consists of fraud firms (denoted with dummy variable 'fraud' taking the value of 1 in case of fraud) and matched those with a non-fraud competitor. For both firms I computed a bunch of variables and I now would like to provide some descriptive statistics and compare their means.
    So I have data which looks something like this:
    Company ID Pair Fraud Diff Leverage
    1 1 1 .5 2.5
    2 1 0 .8 1.2
    3 2 1 .3 1.8
    4 2 0 -.5 2.9
    Now I want to get a table with descriptive statistics which gives me the variables (Diff and Leverage is this case) and their mean, median and St. deviation of both the fraud (indicated with 1)- and non fraud firms(0) and the differences between the means of the fraud and non-fraud sample.
    I'm not quite sure if I want a 'matched pair comparison', where the (non)fraud firms are compared within the pair or just the means for all fraud firms and all non-fraud firms. I don't know if it is possible to do it both these ways? Can someone tell me what code to use, for both type of tests?

    Kind regards,
    Thomas

  • #2
    Well, on the one hand you say you want to provide descriptive statistics, and on the other you talk about t-tests. T-tests are not descriptive statistics; they are inferential. If you want descriptive statistics, then probably the simplest way to get them from this data would be:

    Code:
    tabstat Diff Leverage, by(Fraud) statistics(N mean sd)
    or something like that. You can also play with the -format()- option to tabstat to get prettier numbers than you might get by default.

    If you are interested in testing hypotheses about whether these variable have different distributions in the populations of Fraud and non-fraud firms, that is where t-tests come in. If you do t-tests in this data you absolutely must account for the matched pairs: an unpaired Student t-test done on matched pair data is not worth the paper it's printed on. (And if you don't print it out so there's no paper, it's still not worth even the zero value of the absent paper because you wasted your time!) There are two ways of getting at this from where you are.

    Since your data are in long format, the most direct way would be:

    Code:
    regress Diff i.Fraud i.Pair
    regress Leverage i.Fraud i.Pair
    (If you have a lot of such variables, consider using a -foreach- loop instead of writing out a slew of -regress- commands.) The coefficient of 1.Fraud will be the mean difference, the standard error will be the standard error of the paired difference, and the t-test in the 1.Fraud row will be the paired t-test. This approach basically emulates a paired t-test by using regression with the pair indicators included as covariates. It's absolutely equivalent.

    If you prefer, you can do it this way:
    Code:
    reshape wide Diff Leverage, i(Pair) j(Fraud)
    ttest Leverage1 = Leverage0
    ttest Diff1 = Diff0
    This way of doing it is more rapidly remembered and understood when you come back to look at your results some months from now. But to do it you must reshape your data into -wide- layout, and there is not a lot else you will be able to do with this data after that.

    Comment


    • #3
      Thomas:
      as an aside to Clyde's helpful advice, any inferential procedure imply a statistical plan ahead of performing it. Put differently, in the methods section of your paper you should explicitly report the difference in the effect between fraud vs non-fraud firms, the statistical test(s) you're intended to perform, critical value and power. Post-hoc comparisons are usually read with suspicious eyes by reviewers.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4

        For both firms I computed a bunch of variables
        Apart from an introductory analysis, it seems a t test wouldn't be the most appropriate solution in this complex scenario.
        Best regards,

        Marcos

        Comment


        • #5
          Thanks Clyde, that helped. And to the rest, this is meant as an introductory analysis.

          Comment

          Working...
          X