Dear members,
I am struggling to test a null hypothesis, which is two means from the same study population (but one from the sample excluding missing obs. and the other the sample including the partially missing observations) are the same. I cannot do the usual T-test because they are not from two independent or paired samples.
The variable (mean) is GST, which is a composite score constructed by 7 variables or questions. The respondents who answered to all of the 7 variables are "0" under miss_gst variable below.
- This means that for my sample without missing observations (excluding those who did not answer to all of the 7 questions), n=529. (miss_gst==0)
- The other sample "with miss" includes those who responded at least one of the 7 questions for this score. This means the sample size is n=565.(569-4) as 4 samples had missing data for all the 7 variables (thus, excluded). For those who responded at least one of the 7 questions, the mean score is calculated by giving a score 0 to the missing data.
- Each question asked a scale of 1-5. The individual mean score is calculated as [the sum of variable 1-7 / No. of variables responded.]
. tab miss_gst
miss_gst | Freq. Percent Cum.
------------+-----------------------------------
0 | 529 92.97 92.97
1 | 11 1.93 94.90
2 | 7 1.23 96.13
3 | 9 1.58 97.72
4 | 3 0.53 98.24
5 | 4 0.70 98.95
6 | 2 0.35 99.30
7 | 4 0.70 100.00
------------+-----------------------------------
Total | 569 100.00
Below is the detailed statistics for the total mean score for 1) sample with missing observations (n=565) and 2) sample without missing observations (n=529)
I need to test if the mean for the sample 1 and sample 2 is the same, or statistically different.
I tried
ttest meanscore_gstwithmiss == 3.990818
but this does not take into account the SD and other distributions for two samples (which are coming from he same study population).
I also created a dummy variable for gst_miss==0 and gst_miss !=0 but this calculate the mean from the sample without missing (529) and those who are gst_miss = 1-6 (those who are not in the gst_miss==0) which is around 39 people only. So this still does not answer to my question.
Could you please guide me how to calculate the statistical difference (T-test) for these two means?
Thanks
best, Rinko
. summarize meanscore_gstwithmiss, detail
Gender Stereotypical Traits (Mean +/- SD) with
missing obs
-------------------------------------------------------------
Percentiles Smallest
1% 1.2 1
5% 2.285714 1
10% 2.714286 1 Obs 565
25% 3.428571 1 Sum of wgt. 565
50% 4.142857 Mean 3.950135
Largest Std. dev. .8604686
75% 4.571429 5
90% 5 5 Variance .7404062
95% 5 5 Skewness -.9709162
99% 5 5 Kurtosis 3.657444
. summarize meanscore_gst, detail
Gender Stereotypical Traits (Mean +/- SD)
-------------------------------------------------------------
Percentiles Smallest
1% 1.714286 1
5% 2.428571 1.285714
10% 2.714286 1.571429 Obs 529
25% 3.571429 1.571429 Sum of wgt. 529
50% 4.142857 Mean 3.990818
Largest Std. dev. .8215419
75% 4.571429 5
90% 5 5 Variance .674931
95% 5 5 Skewness -.8749316
99% 5 5 Kurtosis 3.251743
I am struggling to test a null hypothesis, which is two means from the same study population (but one from the sample excluding missing obs. and the other the sample including the partially missing observations) are the same. I cannot do the usual T-test because they are not from two independent or paired samples.
The variable (mean) is GST, which is a composite score constructed by 7 variables or questions. The respondents who answered to all of the 7 variables are "0" under miss_gst variable below.
- This means that for my sample without missing observations (excluding those who did not answer to all of the 7 questions), n=529. (miss_gst==0)
- The other sample "with miss" includes those who responded at least one of the 7 questions for this score. This means the sample size is n=565.(569-4) as 4 samples had missing data for all the 7 variables (thus, excluded). For those who responded at least one of the 7 questions, the mean score is calculated by giving a score 0 to the missing data.
- Each question asked a scale of 1-5. The individual mean score is calculated as [the sum of variable 1-7 / No. of variables responded.]
. tab miss_gst
miss_gst | Freq. Percent Cum.
------------+-----------------------------------
0 | 529 92.97 92.97
1 | 11 1.93 94.90
2 | 7 1.23 96.13
3 | 9 1.58 97.72
4 | 3 0.53 98.24
5 | 4 0.70 98.95
6 | 2 0.35 99.30
7 | 4 0.70 100.00
------------+-----------------------------------
Total | 569 100.00
Below is the detailed statistics for the total mean score for 1) sample with missing observations (n=565) and 2) sample without missing observations (n=529)
I need to test if the mean for the sample 1 and sample 2 is the same, or statistically different.
I tried
ttest meanscore_gstwithmiss == 3.990818
but this does not take into account the SD and other distributions for two samples (which are coming from he same study population).
I also created a dummy variable for gst_miss==0 and gst_miss !=0 but this calculate the mean from the sample without missing (529) and those who are gst_miss = 1-6 (those who are not in the gst_miss==0) which is around 39 people only. So this still does not answer to my question.
Could you please guide me how to calculate the statistical difference (T-test) for these two means?
Thanks
best, Rinko
. summarize meanscore_gstwithmiss, detail
Gender Stereotypical Traits (Mean +/- SD) with
missing obs
-------------------------------------------------------------
Percentiles Smallest
1% 1.2 1
5% 2.285714 1
10% 2.714286 1 Obs 565
25% 3.428571 1 Sum of wgt. 565
50% 4.142857 Mean 3.950135
Largest Std. dev. .8604686
75% 4.571429 5
90% 5 5 Variance .7404062
95% 5 5 Skewness -.9709162
99% 5 5 Kurtosis 3.657444
. summarize meanscore_gst, detail
Gender Stereotypical Traits (Mean +/- SD)
-------------------------------------------------------------
Percentiles Smallest
1% 1.714286 1
5% 2.428571 1.285714
10% 2.714286 1.571429 Obs 529
25% 3.571429 1.571429 Sum of wgt. 529
50% 4.142857 Mean 3.990818
Largest Std. dev. .8215419
75% 4.571429 5
90% 5 5 Variance .674931
95% 5 5 Skewness -.8749316
99% 5 5 Kurtosis 3.251743
Comment