Dear All,
Upon checking the randomization of our sample in each country, we observed systematic differences between the control and treatment groups for some of our variables.
These discrepancies could invalidate our t-test results for the dependent variables (in the sample i include one) on the treatment. To address this issue, I thought running recursive t-tests on subsamples, replacing the oversampled observations each time with the one of the same "group" to ensure random selection. The original randomization should have been done by gender and age groups in each of the two countries, but i guess was not.
To ensure a proper re-random selection in these subsamples, I need to balance the treatment and control groups by age group and gender. I am report here some info of the whole dataset
Country 1
Country 2
Due to the fact that the age gr 18-24 is very underrepresented I also considering dropping it.
If anyone has an idea of how we could address this issue would be very much appreciated.
I copy here a subsample of the original sample.
thank you in advance
Federica
input int id byte(country treatment) float y byte age float age_gr byte(gender education)
1 2 1 6 59 3 2 3
2 2 1 9 61 3 2 2
3 2 1 8 62 3 1 4
4 2 1 3 54 3 1 2
5 2 1 6 50 2 1 2
6 2 1 2 40 2 1 4
7 2 1 8 26 2 2 3
8 2 1 6 67 3 1 4
9 2 1 7 45 2 1 3
10 2 1 6 47 2 1 3
11 2 1 8 44 2 1 4
12 2 1 5 70 3 1 4
13 2 1 6 60 3 1 3
14 2 1 8 57 3 1 2
15 2 1 8 42 2 1 3
16 2 1 9 42 2 1 3
17 2 1 2 50 2 1 3
18 2 1 8 44 2 2 3
19 2 1 5 63 3 2 3
20 2 1 5 20 1 2 3
21 2 1 5 74 3 1 2
22 2 1 7 64 3 1 3
23 2 1 5 67 3 1 1
24 2 1 6 54 3 2 4
25 2 1 3 56 3 2 3
26 2 1 5 25 2 2 3
27 2 1 10 32 2 2 3
28 2 1 5 21 1 1 4
29 2 1 3 52 2 2 3
30 2 1 9 19 1 1 3
31 2 1 2 66 3 1 2
32 2 1 7 46 2 2 2
33 2 1 8 52 2 2 2
34 2 1 8 24 1 2 4
35 2 1 9 69 3 2 3
36 2 1 7 60 3 2 2
37 2 1 6 62 3 1 4
38 2 1 7 45 2 2 4
39 2 1 3 65 3 2 1
40 2 1 10 52 2 2 4
41 2 1 5 57 3 2 3
42 2 1 8 51 2 1 3
43 2 1 5 58 3 2 2
44 2 1 0 72 3 2 2
45 2 1 6 42 2 2 3
46 2 1 10 45 2 1 4
47 2 1 8 42 2 1 2
48 2 1 5 40 2 1 4
49 2 1 7 48 2 1 3
50 2 1 4 42 2 1 3
201 2 2 4 57 3 2 3
202 2 2 5 40 2 1 2
203 2 2 5 72 3 1 3
204 2 2 5 72 3 1 3
205 2 2 7 55 3 1 2
206 2 2 5 68 3 2 4
207 2 2 8 65 3 1 3
208 2 2 2 64 3 1 3
209 2 2 10 63 3 2 3
210 2 2 9 30 2 2 2
211 2 2 5 20 1 1 4
212 2 2 7 54 3 2 3
213 2 2 6 64 3 1 2
214 2 2 8 69 3 2 2
215 2 2 5 65 3 2 2
216 2 2 9 42 2 1 2
217 2 2 6 52 2 1 2
218 2 2 7 40 2 1 2
219 2 2 6 61 3 2 3
220 2 2 8 28 2 1 3
221 2 2 4 21 1 2 4
222 2 2 2 31 2 1 4
223 2 2 6 55 3 2 2
224 2 2 10 50 2 2 4
225 2 2 4 22 1 1 3
226 2 2 9 51 2 2 2
227 2 2 6 63 3 2 3
228 2 2 8 70 3 1 4
229 2 2 8 47 2 2 2
230 2 2 3 41 2 2 3
231 2 2 6 20 1 1 2
232 2 2 7 22 1 1 2
233 2 2 8 34 2 1 4
234 2 2 6 74 3 1 4
235 2 2 4 75 3 1 4
236 2 2 6 66 3 2 3
237 2 2 6 59 3 1 4
238 2 2 6 64 3 2 2
239 2 2 6 41 2 2 4
240 2 2 8 75 3 1 4
241 2 2 2 52 2 1 3
242 2 2 5 53 3 1 4
243 2 2 6 23 1 1 3
244 2 2 9 25 2 2 4
245 2 2 6 21 1 2 3
246 2 2 8 23 1 1 4
247 2 2 9 66 3 2 2
248 2 2 4 68 3 2 2
249 2 2 7 56 3 2 2
250 2 2 7 70 3 2 3
151 3 1 5 25 2 2 2
152 3 1 7 22 1 2 2
153 3 1 6 65 3 2 3
154 3 1 9 30 2 1 2
155 3 1 8 61 3 2 2
156 3 1 8 52 2 1 2
157 3 1 8 56 3 2 3
158 3 1 3 47 2 2 2
159 3 1 4 21 1 2 3
160 3 1 7 65 3 2 3
161 3 1 3 44 2 1 3
162 3 1 9 64 3 2 3
163 3 1 8 28 2 2 3
164 3 1 6 18 1 1 2
165 3 1 9 28 2 2 3
166 3 1 6 40 2 2 2
167 3 1 9 50 2 1 2
168 3 1 7 18 1 1 2
169 3 1 8 39 2 2 3
170 3 1 6 54 3 1 3
171 3 1 8 42 2 1 2
172 3 1 5 75 3 1 2
173 3 1 10 42 2 1 2
174 3 1 9 41 2 1 2
175 3 1 8 42 2 1 2
176 3 1 7 21 1 1 2
177 3 1 5 45 2 2 3
178 3 1 5 69 3 1 4
179 3 1 5 61 3 2 2
180 3 1 8 50 2 2 2
181 3 1 5 67 3 2 2
182 3 1 8 69 3 2 2
183 3 1 9 26 2 1 3
184 3 1 10 26 2 1 3
185 3 1 10 52 2 2 3
186 3 1 8 52 2 2 2
187 3 1 4 51 2 2 3
188 3 1 5 70 3 1 3
189 3 1 5 19 1 1 2
190 3 1 7 53 3 1 2
191 3 1 7 66 3 2 2
192 3 1 7 38 2 1 3
193 3 1 3 37 2 1 3
194 3 1 8 19 1 2 2
195 3 1 3 44 2 1 2
196 3 1 10 21 1 1 3
197 3 1 4 44 2 2 4
198 3 1 6 66 3 1 3
199 3 1 4 70 3 2 2
200 3 1 8 66 3 2 3
201 3 2 5 44 2 1 3
202 3 2 7 44 2 2 3
203 3 2 6 44 2 1 3
204 3 2 6 45 2 1 3
205 3 2 0 43 2 1 2
206 3 2 3 44 2 1 3
207 3 2 7 30 2 1 3
208 3 2 4 29 2 1 3
209 3 2 6 29 2 2 3
210 3 2 8 63 3 2 2
211 3 2 6 61 3 1 2
212 3 2 5 31 2 2 3
213 3 2 7 31 2 2 3
214 3 2 6 29 2 1 3
215 3 2 10 25 2 2 3
216 3 2 7 19 1 1 2
217 3 2 5 75 3 1 2
218 3 2 9 25 2 1 3
219 3 2 9 29 2 1 2
220 3 2 4 26 2 1 3
221 3 2 8 25 2 1 3
222 3 2 4 36 2 2 3
223 3 2 5 36 2 1 3
224 3 2 7 35 2 2 3
225 3 2 1 39 2 1 3
226 3 2 7 57 3 2 2
227 3 2 9 36 2 1 3
228 3 2 4 60 3 2 2
229 3 2 4 40 2 2 3
230 3 2 5 56 3 2 3
231 3 2 4 61 3 1 3
232 3 2 6 33 2 2 3
233 3 2 7 61 3 1 3
234 3 2 9 57 3 2 2
235 3 2 6 65 3 1 4
236 3 2 4 73 3 1 2
237 3 2 7 69 3 1 2
238 3 2 9 69 3 2 2
239 3 2 5 68 3 2 3
240 3 2 5 61 3 2 2
241 3 2 5 63 3 1 3
242 3 2 9 53 3 2 3
243 3 2 8 52 2 1 3
244 3 2 8 51 2 1 3
245 3 2 5 65 3 1 3
246 3 2 6 57 3 2 3
247 3 2 8 51 2 2 2
248 3 2 7 30 2 1 3
249 3 2 8 31 2 1 3
250 3 2 6 42 2 1 2
end
label values country co
label def co 2 "Country 1", modify
label def co 3 "Country 2", modify
label values treatment group
label def group 1 "Blind", modify
label def group 2 "No Blind", modify
label values age_gr age_gr
label def age_gr 1 "18-24", modify
label def age_gr 2 "25-54", modify
label def age_gr 3 "55-75", modify
label values gender sex
label def sex 1 "Male", modify
label def sex 2 "Female", modify
label values education v16
label def v16 1 "Primary Education", modify
label def v16 2 "Secondary Education", modify
label def v16 3 "Tertiary Education", modify
label def v16 4 "Univeristy and higher", modify
[/CODE]
Upon checking the randomization of our sample in each country, we observed systematic differences between the control and treatment groups for some of our variables.
These discrepancies could invalidate our t-test results for the dependent variables (in the sample i include one) on the treatment. To address this issue, I thought running recursive t-tests on subsamples, replacing the oversampled observations each time with the one of the same "group" to ensure random selection. The original randomization should have been done by gender and age groups in each of the two countries, but i guess was not.
To ensure a proper re-random selection in these subsamples, I need to balance the treatment and control groups by age group and gender. I am report here some info of the whole dataset
Country 1
age_gr | blind | no blind | total |
18-24 | 56 | 92 | 148 |
25-54 | 392 | 460 | 852 |
55-75 | 352 | 248 | 600 |
total | 800 | 800 | 1600 |
Country 2
age_gr | blind | no blind | total |
18-24 | 128 | 136 | 264 |
25-54 | 364 | 432 | 796 |
55-75 | 308 | 232 | 540 |
total | 800 | 800 | 1600 |
If anyone has an idea of how we could address this issue would be very much appreciated.
I copy here a subsample of the original sample.
thank you in advance
Federica
input int id byte(country treatment) float y byte age float age_gr byte(gender education)
1 2 1 6 59 3 2 3
2 2 1 9 61 3 2 2
3 2 1 8 62 3 1 4
4 2 1 3 54 3 1 2
5 2 1 6 50 2 1 2
6 2 1 2 40 2 1 4
7 2 1 8 26 2 2 3
8 2 1 6 67 3 1 4
9 2 1 7 45 2 1 3
10 2 1 6 47 2 1 3
11 2 1 8 44 2 1 4
12 2 1 5 70 3 1 4
13 2 1 6 60 3 1 3
14 2 1 8 57 3 1 2
15 2 1 8 42 2 1 3
16 2 1 9 42 2 1 3
17 2 1 2 50 2 1 3
18 2 1 8 44 2 2 3
19 2 1 5 63 3 2 3
20 2 1 5 20 1 2 3
21 2 1 5 74 3 1 2
22 2 1 7 64 3 1 3
23 2 1 5 67 3 1 1
24 2 1 6 54 3 2 4
25 2 1 3 56 3 2 3
26 2 1 5 25 2 2 3
27 2 1 10 32 2 2 3
28 2 1 5 21 1 1 4
29 2 1 3 52 2 2 3
30 2 1 9 19 1 1 3
31 2 1 2 66 3 1 2
32 2 1 7 46 2 2 2
33 2 1 8 52 2 2 2
34 2 1 8 24 1 2 4
35 2 1 9 69 3 2 3
36 2 1 7 60 3 2 2
37 2 1 6 62 3 1 4
38 2 1 7 45 2 2 4
39 2 1 3 65 3 2 1
40 2 1 10 52 2 2 4
41 2 1 5 57 3 2 3
42 2 1 8 51 2 1 3
43 2 1 5 58 3 2 2
44 2 1 0 72 3 2 2
45 2 1 6 42 2 2 3
46 2 1 10 45 2 1 4
47 2 1 8 42 2 1 2
48 2 1 5 40 2 1 4
49 2 1 7 48 2 1 3
50 2 1 4 42 2 1 3
201 2 2 4 57 3 2 3
202 2 2 5 40 2 1 2
203 2 2 5 72 3 1 3
204 2 2 5 72 3 1 3
205 2 2 7 55 3 1 2
206 2 2 5 68 3 2 4
207 2 2 8 65 3 1 3
208 2 2 2 64 3 1 3
209 2 2 10 63 3 2 3
210 2 2 9 30 2 2 2
211 2 2 5 20 1 1 4
212 2 2 7 54 3 2 3
213 2 2 6 64 3 1 2
214 2 2 8 69 3 2 2
215 2 2 5 65 3 2 2
216 2 2 9 42 2 1 2
217 2 2 6 52 2 1 2
218 2 2 7 40 2 1 2
219 2 2 6 61 3 2 3
220 2 2 8 28 2 1 3
221 2 2 4 21 1 2 4
222 2 2 2 31 2 1 4
223 2 2 6 55 3 2 2
224 2 2 10 50 2 2 4
225 2 2 4 22 1 1 3
226 2 2 9 51 2 2 2
227 2 2 6 63 3 2 3
228 2 2 8 70 3 1 4
229 2 2 8 47 2 2 2
230 2 2 3 41 2 2 3
231 2 2 6 20 1 1 2
232 2 2 7 22 1 1 2
233 2 2 8 34 2 1 4
234 2 2 6 74 3 1 4
235 2 2 4 75 3 1 4
236 2 2 6 66 3 2 3
237 2 2 6 59 3 1 4
238 2 2 6 64 3 2 2
239 2 2 6 41 2 2 4
240 2 2 8 75 3 1 4
241 2 2 2 52 2 1 3
242 2 2 5 53 3 1 4
243 2 2 6 23 1 1 3
244 2 2 9 25 2 2 4
245 2 2 6 21 1 2 3
246 2 2 8 23 1 1 4
247 2 2 9 66 3 2 2
248 2 2 4 68 3 2 2
249 2 2 7 56 3 2 2
250 2 2 7 70 3 2 3
151 3 1 5 25 2 2 2
152 3 1 7 22 1 2 2
153 3 1 6 65 3 2 3
154 3 1 9 30 2 1 2
155 3 1 8 61 3 2 2
156 3 1 8 52 2 1 2
157 3 1 8 56 3 2 3
158 3 1 3 47 2 2 2
159 3 1 4 21 1 2 3
160 3 1 7 65 3 2 3
161 3 1 3 44 2 1 3
162 3 1 9 64 3 2 3
163 3 1 8 28 2 2 3
164 3 1 6 18 1 1 2
165 3 1 9 28 2 2 3
166 3 1 6 40 2 2 2
167 3 1 9 50 2 1 2
168 3 1 7 18 1 1 2
169 3 1 8 39 2 2 3
170 3 1 6 54 3 1 3
171 3 1 8 42 2 1 2
172 3 1 5 75 3 1 2
173 3 1 10 42 2 1 2
174 3 1 9 41 2 1 2
175 3 1 8 42 2 1 2
176 3 1 7 21 1 1 2
177 3 1 5 45 2 2 3
178 3 1 5 69 3 1 4
179 3 1 5 61 3 2 2
180 3 1 8 50 2 2 2
181 3 1 5 67 3 2 2
182 3 1 8 69 3 2 2
183 3 1 9 26 2 1 3
184 3 1 10 26 2 1 3
185 3 1 10 52 2 2 3
186 3 1 8 52 2 2 2
187 3 1 4 51 2 2 3
188 3 1 5 70 3 1 3
189 3 1 5 19 1 1 2
190 3 1 7 53 3 1 2
191 3 1 7 66 3 2 2
192 3 1 7 38 2 1 3
193 3 1 3 37 2 1 3
194 3 1 8 19 1 2 2
195 3 1 3 44 2 1 2
196 3 1 10 21 1 1 3
197 3 1 4 44 2 2 4
198 3 1 6 66 3 1 3
199 3 1 4 70 3 2 2
200 3 1 8 66 3 2 3
201 3 2 5 44 2 1 3
202 3 2 7 44 2 2 3
203 3 2 6 44 2 1 3
204 3 2 6 45 2 1 3
205 3 2 0 43 2 1 2
206 3 2 3 44 2 1 3
207 3 2 7 30 2 1 3
208 3 2 4 29 2 1 3
209 3 2 6 29 2 2 3
210 3 2 8 63 3 2 2
211 3 2 6 61 3 1 2
212 3 2 5 31 2 2 3
213 3 2 7 31 2 2 3
214 3 2 6 29 2 1 3
215 3 2 10 25 2 2 3
216 3 2 7 19 1 1 2
217 3 2 5 75 3 1 2
218 3 2 9 25 2 1 3
219 3 2 9 29 2 1 2
220 3 2 4 26 2 1 3
221 3 2 8 25 2 1 3
222 3 2 4 36 2 2 3
223 3 2 5 36 2 1 3
224 3 2 7 35 2 2 3
225 3 2 1 39 2 1 3
226 3 2 7 57 3 2 2
227 3 2 9 36 2 1 3
228 3 2 4 60 3 2 2
229 3 2 4 40 2 2 3
230 3 2 5 56 3 2 3
231 3 2 4 61 3 1 3
232 3 2 6 33 2 2 3
233 3 2 7 61 3 1 3
234 3 2 9 57 3 2 2
235 3 2 6 65 3 1 4
236 3 2 4 73 3 1 2
237 3 2 7 69 3 1 2
238 3 2 9 69 3 2 2
239 3 2 5 68 3 2 3
240 3 2 5 61 3 2 2
241 3 2 5 63 3 1 3
242 3 2 9 53 3 2 3
243 3 2 8 52 2 1 3
244 3 2 8 51 2 1 3
245 3 2 5 65 3 1 3
246 3 2 6 57 3 2 3
247 3 2 8 51 2 2 2
248 3 2 7 30 2 1 3
249 3 2 8 31 2 1 3
250 3 2 6 42 2 1 2
end
label values country co
label def co 2 "Country 1", modify
label def co 3 "Country 2", modify
label values treatment group
label def group 1 "Blind", modify
label def group 2 "No Blind", modify
label values age_gr age_gr
label def age_gr 1 "18-24", modify
label def age_gr 2 "25-54", modify
label def age_gr 3 "55-75", modify
label values gender sex
label def sex 1 "Male", modify
label def sex 2 "Female", modify
label values education v16
label def v16 1 "Primary Education", modify
label def v16 2 "Secondary Education", modify
label def v16 3 "Tertiary Education", modify
label def v16 4 "Univeristy and higher", modify
[/CODE]
Comment