Split my sample and do t-test

Ruba hamed

Join Date: Jan 2018

Posts: 4
#1

Split my sample and do t-test

30 Jan 2018, 03:05

Hello,
I need a help to group/segregate my data as next:
I am working on a cross-sectional data over the period (2011-2015) using around 21 variables (I am testing a variable called CSR before and after new regulations launched in 2014) and I need to split my sample for two groups using benchmark year (2014), on other words, I want to run t-test to compare the first sample (group of the years 2011and 2012) with the second sample (group of the years 2014 and 2015).
I know that I need to run the paired t-test but to do that I have to split the sample first to be pre and post the new regulations depending on the year.
I run the following code to split the sample :

"gen post = csr if inrange (year,2014,2015)"
"gen pre = csr if inrange (year,2011,2012)"

but when I tried to run the paired t-test using the following code it says that no observations:
"ttest pre==post"

I am stuck at this point and I need a professional advice, please
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#2

30 Jan 2018, 03:27

Ruba:
some comments on your query.
Stata reports no observations because when one dummy has a value, the other dummy is actually missing.
Please note that Stata omit observations with missing data in any variable.
The simplest approach that spring to my mind is:

Code:

gen pre_post=0 if year<=2012 replace pre_post=1 if year>2012 ttest CSR, by(pre_post) unequal

Warning: code not test.ed

Last edited by Carlo Lazzaro; 30 Jan 2018, 03:32.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Reto Wernli

Join Date: Jan 2018

Posts: 4
#3

30 Jan 2018, 03:56

Thanks Carlo for that comprehensive answer.

I have a similar issue and am not sure whether to use the prtest or ztest command:

Data: Representative sample survey data on firms (almost exclusively binary & indicator variables)
Variables of interest (have many more, but these two as an example):
Industry (I have created a dummy for each of the 6 industries

Mortgage (0 respectively 1 if a firm has no / has a mortgage)

Groups: I split the sample into two groups:
Firm applied for external finance

did not apply for external finance

Now I'm interested whether Firms exhibit differences across the 2 groups regarding the 2 variables of interest (This is my 1st step; in a 2nd step I will run a heckprobit with the selection variable "needing finance").

After researching on what test is appropriate, I run both a prtest and ztest for this issue. Still I am unsure on which to choose, confidence bands are much narrower in the prtest (see attachment). Why is that, what am I not understanding?

Thanks for your precious time!
Reto
Comment
Ruba hamed

Join Date: Jan 2018

Posts: 4
#4

30 Jan 2018, 09:25

Dear Carlo, Thank you so much for the help the code works perfectly
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

30 Jan 2018, 09:41

Re #3: The use of -ztest- is not appropriate for this data. -ztest- should be used to test the equality of means from two samples drawn from populations with known standard deviations--and you don't have that here, or if you do, you have not described that in your question.

If you had those standard deviations, they must be specified in options for the -ztest- command. You don't show the command you used, but based on the output, I infer that you specified -sd(1)-, or specified nothing at all and allowed -sd()- to default to 1, is not even a possible value of the standard deviation for indicator variables. In any case, it is not really even a question of specifying the right values for -sd()- here: you don't really have known population standard deviations for this data anyway.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#6

30 Jan 2018, 10:09

Reto:
welcome to the list.
Two asides to Clyde helpful advice:
-why creeating yourself dummies when in -long- format you can exploit the wonderful capabilities of -fvvarlist-?
-if you are dealing with survey data, any inferencer should be -svy- prefixed (see -help svy-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Reto Wernli

Join Date: Jan 2018

Posts: 4
#7

31 Jan 2018, 02:53

Thank you very much Clyde & Carlo for your immediate & helpful reply.
So prtest is the way to go then? (seems identical to ttest regarding the p-values)
Also - assuming results being identical with a ttest - I have just found the convenient package -ttable2- (by Xuan Zhang & Chuntao Li), which allows me to test several variables simultaneously and produces a nice table:

.
.
.

-why creeating yourself dummies when in -long- format you can exploit the wonderful capabilities of -fvvarlist-?
-if you are dealing with survey data, any inferencer should be -svy- prefixed (see -help svy-).

This are 2 very good points. For my regressions I am indeed using the i.-prefix for my factor variables like industry and the svy prefix to weight the observations. So I have been scratching my head over how to apply this for my descriptive stats & tests.
With tabstat -svy- is not allowed, but i am using frequency weights instead: tabstat Ind* Size* [fw=myweight] ,stat(mean sd) col(stat)

factor variables are not feasible either with tabstat, this was the reason I created those dummies for industry, size, etc.

I did not find any option to use weights (or factor variables) while testing. The only workaround I've found so far is to run a regression for each individual variable of interest. But I was hoping there would be a more elegant way?

Many Thanks
Reto
1 like
Comment
Seyed Mahmoud Hosseinniakani

Join Date: Apr 2018

Posts: 59
#8

26 Apr 2018, 06:10

Hi Reto,
I ran ttable2 codes introduced by Xuan Zhang & Chuntao Li, but in my output, the significant level is not determined by the star (*). I see your output has the star sign.
Could you please let me know how did you determine the star sign or significant level?
My output is attached.

Attached Files
Comment

Announcement

Split my sample and do t-test

Comment

Comment

Comment

Comment

Comment

Comment

Comment