Estimating means from the same study population (but with or without missing observations)

Rinko Kinoshita

Join Date: Aug 2021

Posts: 19
#1

Estimating means from the same study population (but with or without missing observations)

06 Jan 2022, 06:16

Dear members,

I am struggling to test a null hypothesis, which is two means from the same study population (but one from the sample excluding missing obs. and the other the sample including the partially missing observations) are the same. I cannot do the usual T-test because they are not from two independent or paired samples.

The variable (mean) is GST, which is a composite score constructed by 7 variables or questions. The respondents who answered to all of the 7 variables are "0" under miss_gst variable below.

- This means that for my sample without missing observations (excluding those who did not answer to all of the 7 questions), n=529. (miss_gst==0)
- The other sample "with miss" includes those who responded at least one of the 7 questions for this score. This means the sample size is n=565.(569-4) as 4 samples had missing data for all the 7 variables (thus, excluded). For those who responded at least one of the 7 questions, the mean score is calculated by giving a score 0 to the missing data.
- Each question asked a scale of 1-5. The individual mean score is calculated as [the sum of variable 1-7 / No. of variables responded.]

. tab miss_gst

miss_gst | Freq. Percent Cum.
------------+-----------------------------------
0 | 529 92.97 92.97
1 | 11 1.93 94.90
2 | 7 1.23 96.13
3 | 9 1.58 97.72
4 | 3 0.53 98.24
5 | 4 0.70 98.95
6 | 2 0.35 99.30
7 | 4 0.70 100.00
------------+-----------------------------------
Total | 569 100.00

Below is the detailed statistics for the total mean score for 1) sample with missing observations (n=565) and 2) sample without missing observations (n=529)

I need to test if the mean for the sample 1 and sample 2 is the same, or statistically different.

I tried

ttest meanscore_gstwithmiss == 3.990818

but this does not take into account the SD and other distributions for two samples (which are coming from he same study population).

I also created a dummy variable for gst_miss==0 and gst_miss !=0 but this calculate the mean from the sample without missing (529) and those who are gst_miss = 1-6 (those who are not in the gst_miss==0) which is around 39 people only. So this still does not answer to my question.

Could you please guide me how to calculate the statistical difference (T-test) for these two means?

Thanks
best, Rinko

. summarize meanscore_gstwithmiss, detail

Gender Stereotypical Traits (Mean +/- SD) with
missing obs
-------------------------------------------------------------
Percentiles Smallest
1% 1.2 1
5% 2.285714 1
10% 2.714286 1 Obs 565
25% 3.428571 1 Sum of wgt. 565

50% 4.142857 Mean 3.950135
Largest Std. dev. .8604686
75% 4.571429 5
90% 5 5 Variance .7404062
95% 5 5 Skewness -.9709162
99% 5 5 Kurtosis 3.657444

. summarize meanscore_gst, detail

Gender Stereotypical Traits (Mean +/- SD)
-------------------------------------------------------------
Percentiles Smallest
1% 1.714286 1
5% 2.428571 1.285714
10% 2.714286 1.571429 Obs 529
25% 3.571429 1.571429 Sum of wgt. 529

50% 4.142857 Mean 3.990818
Largest Std. dev. .8215419
75% 4.571429 5
90% 5 5 Variance .674931
95% 5 5 Skewness -.8749316
99% 5 5 Kurtosis 3.251743
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3149
#2

06 Jan 2022, 13:01

Code:

ttest x == y , unequal unpaired
Comment
Rinko Kinoshita

Join Date: Aug 2021

Posts: 19
#3

06 Jan 2022, 15:48

George Ford Dear George, Thanks so much for the code. This works out perfectly and I could get the output below.

Many many thanks, for your help !!!
Rinko

. ttest meanscore_gst== meanscore_gstwithmiss, unequal unpaired

Two-sample t test with unequal variances
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
meansc~t | 529 3.990818 .0357192 .8215419 3.920649 4.060987
meansc.. | 565 3.950135 .0362002 .8604686 3.879031 4.021238
---------+--------------------------------------------------------------------
Combined | 1,094 3.969807 .0254487 .8417322 3.919873 4.019741
---------+--------------------------------------------------------------------
diff | .0406834 .0508558 -.0591028 .1404696
------------------------------------------------------------------------------
diff = mean(meanscore_gst) - mean(meanscore_gstw~s) t = 0.8000
H0: diff = 0 Satterthwaite's degrees of freedom = 1091.58

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.7881 Pr(|T| > |t|) = 0.4239 Pr(T > t) = 0.2119
Comment

Announcement

Estimating means from the same study population (but with or without missing observations)

Comment

Comment