Ttest

Eugene Lacoste

Join Date: Jul 2020

Posts: 24
#1

Ttest

22 Jul 2020, 04:17

Dear all,

I hope you are all doing great.

I am trying to calculate the ttest of x (total dollar's placement) by y (with 1 = inside the US and 0 = outside the US) over the period 1995 and 2000.
I tried the following code:

Code:

ttest x, by(y)

By doing so, I found the result. Nonetheless, I need to calculated the mean of x for each y (1 & 0) scaled by the total x for the period above 1998 & the period below 1998. So it would give us ((mean x before 1998) - (mean x after 1998) / (total mean over period)) for each y. So we would get 4 means in total.

Code:

egen mean_1 = mean(x) if year <= 1998 & y == 1 fillmissing mean_1, with(any) egen mean_2 = mean(x) if year > 1998 & y == 1 fillmissing mean_2, with(any) egen mean_3 = mean(x) gen x_1 = ((mean_1-mean_2)/mean_3) egen mean_11 = mean(x) if year <= 1998 & y == 0 fillmissing mean_11, with(any) egen mean_12 = mean(x) if year > 1998 & y == 0 fillmissing mean_12, with(any) gen x_2 = ((mean_11-mean_12)/mean_3) gen Pre_Post = . replace Pre_Post = x_1 if y == 1 replace Pre_Post = x_2 if y == 0 ttest Pre_Post, by(y)

I did the following but the ttest at the end does not work and I find the code too long. Is there any way to solve the ttest (t-statistics = .) and to reduce the length?

Thanks in advance,
Eugene
Tags: ttest
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#2

22 Jul 2020, 07:10

Given your setup, you should probably change the t-test command from ttest to ttesti, the immediate version of the command. The immediate version requires the standard deviation of the two means and the overall number of observation.
The code for this version would be a bit shorter than what you posted but not by much.
Please also note that fillmissing is a community-contributed program by Attaullah Shah, which you should have noted in your post because people without this command cannot replicate your code.
The command is also probably not needed for the task at hand.
You should post an example of your dataset generated by running the dataex command so that other people can test their solution attempts and understand your problem better.
Comment

Eugene Lacoste

Join Date: Jul 2020
Posts: 24

22 Jul 2020, 07:45

Dear Sven,

Thank you very much for your answer.
You can see below an example of the code as per my code in my first post:

Code:

input float(x y mean_1 mean_2 mean_3 mean_11 mean_12 x_1 n_2 Pre_Post)
10.026775 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 206.7967 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 6.487872 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
137.74028 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 4.877652 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
    32.33 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 9.869688 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 55.75826 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
146.87474 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 38.81283 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
  175.912 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 151.4229 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 48.86275 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
126.48425 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
242.65695 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498
 42.28203 0 368.2543 370.1607 296.10037 289.71872 292.70178 -.006438469 -.010074498 -.010074498

I used the fillmissing because the respective column of the mean (mean_1, mean_2, mean_11 & mean_12) were almot empty (see below) since we have the condition y == 1 & the period.

In addition, since I need to the differences between the mean calculated for y =0 (& y= 1) when the year is below <= 1998 and above >1998 (scaled by the mean of the full period from 1995-2000). I could not figured out how to do it besides fillmissing.

Code:

input float(mean_1 mean_2 mean_11 mean_12)
. . 289.71872         .
. . 289.71872         .
. .         . 292.70178
. . 289.71872         .
. .         . 292.70178
. .         . 292.70178
. . 289.71872         .
. . 289.71872         .
. . 289.71872         .
. .         . 292.70178
. .         . 292.70178
. . 289.71872         .
. . 289.71872         .
. . 289.71872         .
. .         . 292.70178
. . 289.71872         .

Let me know if it is better this time !
Eugene

Comment

Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#4

22 Jul 2020, 10:00

Your data example has no observations for y==1. Besides that, you probably need to reformulate what you want to achieve. At the moment, the t-test should not work. The standard deviation of your two groups is 0 because create one constant value per group.
If I understand your code correctly then you try to compare the differences in the mean before 1998 and after 1998 for the two groups.

You could create a new dummy variable for the time period, regress on the interaction of the two dummies and then test if the difference is significant. Something like the code below could, but I could not test it with your data example.

Code:

gen t =( year > 1998) regress x i.y#i.t test (_cons - _b[0.y#1.t] = _b[1.y#0.t]- _b[1.y#1.t])

The constant in this regression is equal to the mean of x when t==0 and y==0.
I ignore the scaling with the mean of x because it does not matter for the test statistic.
Comment

Eugene Lacoste

Join Date: Jul 2020
Posts: 24

22 Jul 2020, 10:39

Hello Sven,

Thanks for your answer.
Actually since the dataset is pretty long, the 1 for y does not show up at the beginning. I added some lines now so that you understand better what it is going on.

Code:

 
input float(x y mean_1 mean_2 mean_11 mean_12 mean_3 x_1 x_2 Pre_Post)
  44456 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  41720 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  31892 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  36592 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  40499 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  39237 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  40407 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  28013 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  25607 1 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .08318283
  18086 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  10192 1 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .08318283
  25279 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  11128 1 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .08318283
  29601 0 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .04644477
  13351 1 363218.3 339503.25 284954.94 271713.72 285096.03 .08318283 .04644477 .08318283

I will try to resume step by step: I want to compare the differences in the mean before 1998 and after 1998 for two groups (y = 0 & y = 1)

To calculate mean_1, I did the following:

Code:

 egen mean_1 = mean(x) if year <= 1998 & y == 1

Since I had missing values in rows where y == 0 & year >1998, I used the fillmissing because otherwise I would not have been able to calculate the means since the observations are not on the same rows.

I followed the same reasoning for mean_2, mean_11 & mean_12.

Code:

egen mean_2 = mean(x) if year > 1998 & y == 1
fillmissing mean_2, with(any)
egen mean_11 = mean(x) if year <= 1998 & y == 0
fillmissing mean_11, with(any)
egen mean_12 = mean(x) if year > 1998 & y == 0
fillmissing mean_12, with(any)

Now, I would like to divide the differences between the mean before 1998 and after 1998 by the mean of the total period.

Code:

 egen mean_3 = mean(x)

I create a new variable called x_1 for the differences between the means for y ==1 before 1998 and after 1998 which is then divided by mean_3:

Code:

gen x_1 = ((mean_1-mean_2)/mean_3)

I do the same for y ==1 with x_2: the difference between the means before 1998 and after 1998 divided by mean_3:

Code:

   
 gen x_2 = ((mean_11-mean_12)/mean_3)

With x_1 and x_2 I am trying to do a t-test and I only came up with this code where I gather together the result I got for x_1 & x_2:

Code:

  
 gen Pre_Post = . replace Pre_Post = x_1 if y == 1 replace Pre_Post = x_2 if y == 0   
 ttest Pre_Post, by(y)

What I want to show is the mean of y == 1 & y == 0 (as per the calculation above) and the difference between the two means with the t-statistics.
Unfortunately I got this as results:

Code:

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |  12,477    .0472881           0           0    .0472881    .0472881
       1 |     911    .0668536           0           0    .0668536    .0668536
---------+--------------------------------------------------------------------
combined |  13,388    .0486195    .0000426    .0049273     .048536    .0487029
---------+--------------------------------------------------------------------
    diff |           -.0195655           0               -.0195655   -.0195655
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =        .
Ho: diff = 0                                     degrees of freedom =    13386

I hope this time it is better !!

Comment

Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#6

22 Jul 2020, 16:01

Like I said before, I understand what your code does. Your data example has also no year variable, so I still cannot run your code.
I suggested a way which should give you the expected outcome without using your code. Unfortunately, I cannot modify your code to get the desired result with the ttest-command.
To make your code work, you need to calculate the standard deviation of the variables x_1 and x_2. Only then you can run the immediate version of the t-test which is the ttesti command. However, x_1 and x_2 are by construction constants and therefore they have no standard deviation other than 0.
The output of the t-test command shows this issue. The output also does not show the t-statistic because it is either a missing value or too small to display.

Instead of using the egen-command to calculate the means you can calculate them with the summarize-command.
See the example code below

Code:

forvalues i=0/1{ summarize x if year <= 1998 & y == `i', meanonly scalar mean_1`i' = r(mean) scalar var_1`i' = r(Var) summarize x if year > 1998 & y == `i',meanonly scalar mean_2`i' = r(mean) scalar var_2`i' = r(Var) } summarize x ,meanonly scalar mean_3 = r(mean) scalar x_1 = (mean_10-mean_20)/mean_3 scalar x_1sd = (1/mean_3)^2*( (var_10 + var_20)^0.5) scalar x_2 = (mean_11-mean_21)/mean_3 scalar x_2sd=(1/mean_3)^2*( (var_11 + var_21)^0.5) count if y==0 scalar n0 = r(N) count if y==1 scalar n1 = r(N) ttesti n0 x_1 x_1sd n1 x_2 x_2sd, unequal

The code is not tested but it hopefully uses the correct formula to calculate the standard deviations for your problem. So the code hopefully point into the right direction.
Comment
Eugene Lacoste

Join Date: Jul 2020

Posts: 24
#7

23 Jul 2020, 01:48

Hello Sven,

Thank you very much for the explanations: now I get why I could not display correctly the ttsest.

I just tried your code and had the following error:

Code:

. ttesti `=n0' `=x_1' `=x_1sd' `=n1' `=x_2' `=x_2sd', unequal mean of first sample is missing r(416)

Do you mind explaining what it means since I thought we define the mean in the forvalues?

Code:

input float(x y year) 44456 0 1999 31892 0 1999 41720 1 1999 36592 1 1995 40499 1 1995 40499 1 1996 39237 0 1997 40407 0 1996 28013 0 1995 25607 1 1999 10192 1 2000

Thank you very much for explaning everything again.
Comment

Sven-Kristjan Bormann

Join Date: Jul 2018
Posts: 310

23 Jul 2020, 09:15

I do not know why the mean of the first sample is missing. You can check if the scalar exists by running

Code:

scalar dir

My previously posted code has some errors. Below is the corrected version which worked with your last posted data example.

Code:

forvalues i=0/1{
    summarize x if year <= 1998 & y == `i'
    scalar mean_1`i' = r(mean)
    scalar var_1`i' = r(Var)
    summarize x if year > 1998 & y == `i'
    scalar mean_2`i' = r(mean)
    scalar var_2`i' = r(Var)
}
summarize x ,meanonly
scalar mean_3 = r(mean)
scalar x_1 = (mean_10-mean_20)/mean_3
scalar x_1sd = (1/mean_3)^2*( (var_10 + var_20)^0.5)
scalar x_2 = (mean_11-mean_21)/mean_3
scalar x_2sd=(1/mean_3)^2*( (var_11 + var_21)^0.5)
count if y==0
scalar n0 = r(N)
count if y==1
scalar n1 = r(N)
ttesti `=n0' `=x_1' `=x_1sd' `=n1' `=x_2' `=x_2sd', unequal

Comment

Eugene Lacoste

Join Date: Jul 2020

Posts: 24
#9

24 Jul 2020, 04:46

Dear Sven,

I would like to thank you again for your help.
The code is working !

Thanks again for your helpful explanations !
Eugene
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment