t test for matched-pair sample

Thomas Meurs

Join Date: Jun 2015

Posts: 27
#1

t test for matched-pair sample

12 Jun 2015, 15:17

Hi all,

This is probably a very basis question, but I'm still quite new to Stata.
I am trying to replicate a paper which investigates differences between fraud firms and their non-fraud competitors. I have constructed a sample which consists of fraud firms (denoted with dummy variable 'fraud' taking the value of 1 in case of fraud) and matched those with a non-fraud competitor. For both firms I computed a bunch of variables and I now would like to provide some descriptive statistics and compare their means.
So I have data which looks something like this:
Company ID Pair Fraud Diff Leverage

1 1 1 .5 2.5

2 1 0 .8 1.2

3 2 1 .3 1.8

4 2 0 -.5 2.9

Now I want to get a table with descriptive statistics which gives me the variables (Diff and Leverage is this case) and their mean, median and St. deviation of both the fraud (indicated with 1)- and non fraud firms(0) and the differences between the means of the fraud and non-fraud sample.
I'm not quite sure if I want a 'matched pair comparison', where the (non)fraud firms are compared within the pair or just the means for all fraud firms and all non-fraud firms. I don't know if it is possible to do it both these ways? Can someone tell me what code to use, for both type of tests?

Kind regards,
Thomas
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

12 Jun 2015, 17:38

Well, on the one hand you say you want to provide descriptive statistics, and on the other you talk about t-tests. T-tests are not descriptive statistics; they are inferential. If you want descriptive statistics, then probably the simplest way to get them from this data would be:

Code:

tabstat Diff Leverage, by(Fraud) statistics(N mean sd)

or something like that. You can also play with the -format()- option to tabstat to get prettier numbers than you might get by default.

If you are interested in testing hypotheses about whether these variable have different distributions in the populations of Fraud and non-fraud firms, that is where t-tests come in. If you do t-tests in this data you absolutely must account for the matched pairs: an unpaired Student t-test done on matched pair data is not worth the paper it's printed on. (And if you don't print it out so there's no paper, it's still not worth even the zero value of the absent paper because you wasted your time!) There are two ways of getting at this from where you are.

Since your data are in long format, the most direct way would be:

Code:

regress Diff i.Fraud i.Pair regress Leverage i.Fraud i.Pair

(If you have a lot of such variables, consider using a -foreach- loop instead of writing out a slew of -regress- commands.) The coefficient of 1.Fraud will be the mean difference, the standard error will be the standard error of the paired difference, and the t-test in the 1.Fraud row will be the paired t-test. This approach basically emulates a paired t-test by using regression with the pair indicators included as covariates. It's absolutely equivalent.

If you prefer, you can do it this way:

Code:

reshape wide Diff Leverage, i(Pair) j(Fraud) ttest Leverage1 = Leverage0 ttest Diff1 = Diff0

This way of doing it is more rapidly remembered and understood when you come back to look at your results some months from now. But to do it you must reshape your data into -wide- layout, and there is not a lot else you will be able to do with this data after that.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#3

13 Jun 2015, 01:30

Thomas:
as an aside to Clyde's helpful advice, any inferential procedure imply a statistical plan ahead of performing it. Put differently, in the methods section of your paper you should explicitly report the difference in the effect between fraud vs non-fraud firms, the statistical test(s) you're intended to perform, critical value and power. Post-hoc comparisons are usually read with suspicious eyes by reviewers.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

13 Jun 2015, 05:56

For both firms I computed a bunch of variables

Apart from an introductory analysis, it seems a t test wouldn't be the most appropriate solution in this complex scenario.

Best regards,

Marcos
Comment
Thomas Meurs

Join Date: Jun 2015

Posts: 27
#5

14 Jun 2015, 08:38

Thanks Clyde, that helped. And to the rest, this is meant as an introductory analysis.
Comment

Company ID	Pair	Fraud	Diff	Leverage
1	1	1	.5	2.5
2	1	0	.8	1.2
3	2	1	.3	1.8
4	2	0	-.5	2.9

Announcement

t test for matched-pair sample

Comment

Comment

Comment

Comment