How to analyze a subsample or should I use interactions?

Michal Onah

Join Date: Apr 2016
Posts: 39

How to analyze a subsample or should I use interactions?

01 Mar 2018, 20:57

Hi All,

I am using Stata14 to run a regression on gender differences in in decision to seek care and utilisation of different health services. I have a sample size of 411 households. For the outcome variables, I have a variable called "decision to seek care" which is binary (seek care when sick/not seek care when sick) and utilisation of health facility (formal healthcare/informal healthcare). Then I have a couple of independent variables including sex of household head (male/female), household decisionmaking (sole by household head/joint by head and spouse/spouse only/others), income earning power (head earns more than spouse/spouse earns more than head/spouse and head earn approx the same/spouse earns no income), marital status (single/married/divorced/widowed), gender of sick member (male/female), etc.

I am interested in examining how gender of sick member, household earning and decisionmaking gender differences, and other variables determine the outcome variables seeking care when sick, and using formal/informal care. The decisionmaking variables are only important when households are headed by a married/live-in individual. How do I perform the regression analyses since only a subsample of households are married. The figures below provide an example of the distribution of my variable across marital status and household dynamics. To avoid looking at only married/divorced households (n=223), how else can I run the regression analyses? Is there a way to use interactions between marital status and hh_earnpower and hh_desmaker? I am not sure since all the interactions will not provide a useful information I believe.

Your advice would be most appreciated!

The code I have used is

Code:

*regression for facility type
logistic facility_type gender_hhead i.hh_earnpower i.hh_desmaker gender_sick cost_care no_adltmale no_adultfmle hhead_emplystat i.head_edulvl i.marital_stat

*regression for sick and no care
logistic sick_nocare gender_hhead i.hh_earnpower i.hh_desmaker gender_sick cost_care no_adltmale no_adultfmle hhead_emplystat i.head_edulvl i.marital_stat

Code:

 
des maker in hhold on monetary expenditure

marital status
respondent
husband/w
jointly
others
Total

never married
26
1
0
7
34

living with spouse
75
51
89
0
215

widowed
148
0
4
2
154

divorced/separated
8
0
0
0
8








Total
257
52
93
9
411

Code:

 
HH head earning power

marital status
More than
Less than
About the same
Spouse earns no income
Don’t know
Total

never married
0
0
0
0
34
34

living with spouse
109
64
37
3
2
215

widowed
0
0
0
0
154
154

divorced/separated
1
0
0
2
5
8









Total
110
64
37
5
195
411

Tags: categorical, data, interaction, regression, syntax

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

02 Mar 2018, 09:11

If you have decided in advance that the decision making and earning power variables are only relevant in married households, but are (or you are investigating whether they are) relevant in married households, then it makes no sense to analyze the sample as a whole. You would simply do separate regressions for married households and other households, and there would be no role for interaction terms, or for these variables at all, in the non-married households model.

This seems to be what you are saying, and, in fact, I have the sense that your data are set up in such a way that your decision making and earning power variables aren't even defined for non-married households, though you do not say that in so many words. If I have that right, then these variables are just coded as missing values in any observations on non-married households. (Or they should be.) In that case, if you set up a model that includes these variables and also interacts them with sex (or anything else for that matter), when Stata assembles the estimation sample, it will include only married households, because any observation that contains a missing value on any regression variable is excluded from estimation in all Stata regression commands.
Comment
Michal Onah

Join Date: Apr 2016

Posts: 39
#3

02 Mar 2018, 09:30

Originally posted by Clyde Schechter View Post

If you have decided in advance that the decision making and earning power variables are only relevant in married households, but are (or you are investigating whether they are) relevant in married households, then it makes no sense to analyze the sample as a whole. You would simply do separate regressions for married households and other households, and there would be no role for interaction terms, or for these variables at all, in the non-married households model.

This seems to be what you are saying, and, in fact, I have the sense that your data are set up in such a way that your decision making and earning power variables aren't even defined for non-married households, though you do not say that in so many words. If I have that right, then these variables are just coded as missing values in any observations on non-married households. (Or they should be.) In that case, if you set up a model that includes these variables and also interacts them with sex (or anything else for that matter), when Stata assembles the estimation sample, it will include only married households, because any observation that contains a missing value on any regression variable is excluded from estimation in all Stata regression commands.

Thanks Clyde!

You are very correct in your assertions. The household decision making and earning power variables are only relevant for households that are married.

Would you suggest I build two models for married households and for “other” households? This way, I can include only relevant variables in the married vs unmarried households? Or alternatively, I can have one model for all households where Stata will exclude missing information? Although the research question seeks to looK at the effects of household-level dynamics on decision to seek care, I would go with the former. Gender of sick member and other gender variables might be important to examine for all the households.

Thanks again!

Mich
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

02 Mar 2018, 10:10

I think you need two separate models here. In fact, if your decision making and earning power variables are coded as missing values in the non-married household observations, it won't be possible to extend an analysis that mentions them to include non-married households. So I think your hand is forced.
Comment
Michal Onah

Join Date: Apr 2016

Posts: 39
#5

02 Mar 2018, 10:20

Originally posted by Clyde Schechter View Post

I think you need two separate models here. In fact, if your decision making and earning power variables are coded as missing values in the non-married household observations, it won't be possible to extend an analysis that mentions them to include non-married households. So I think your hand is forced.

Thanks again for the swift response.

If I build two seperate models, do I need to worry about the magnitude of odds ratios detectable by the power of the independent variables in the subsample and if the subsample is enough to make inferences? If yes, how would you suggest I do this.

Apologies if the questions are rudimentary.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

02 Mar 2018, 12:36

Yes, the subsamples will result in lower statistical power than the full sample. And, because the models are different, you cannot compare the coefficient (or odds ratio) of X in one model with the coefficient of X in the other. If this is a concern to you, you might consider two models:

Model 1: Includes earning power and decision making variables and is estimated only on married households.
Model 2: Excludes earning power and decision making variables and is estimated on the full data sample.

Again, you still can't do cross-comparisons between effects of X in Model 1 with effects of X in Model 2. But Model 2 will have the full power of your sample. And model 1 will have the maximum power achievable with those variables.
Comment
Michal Onah

Join Date: Apr 2016

Posts: 39
#7

02 Mar 2018, 14:09

Originally posted by Clyde Schechter View Post

Yes, the subsamples will result in lower statistical power than the full sample. And, because the models are different, you cannot compare the coefficient (or odds ratio) of X in one model with the coefficient of X in the other. If this is a concern to you, you might consider two models:

Model 1: Includes earning power and decision making variables and is estimated only on married households.
Model 2: Excludes earning power and decision making variables and is estimated on the full data sample.

Again, you still can't do cross-comparisons between effects of X in Model 1 with effects of X in Model 2. But Model 2 will have the full power of your sample. And model 1 will have the maximum power achievable with those variables.

Thanks Clyde. Your comments have been most useful!
Comment

Michal Onah

Join Date: Apr 2016
Posts: 39

07 Mar 2018, 12:52

Originally posted by Clyde Schechter View Post

Yes, the subsamples will result in lower statistical power than the full sample. And, because the models are different, you cannot compare the coefficient (or odds ratio) of X in one model with the coefficient of X in the other. If this is a concern to you, you might consider two models:

Model 1: Includes earning power and decision making variables and is estimated only on married households.
Model 2: Excludes earning power and decision making variables and is estimated on the full data sample.

Again, you still can't do cross-comparisons between effects of X in Model 1 with effects of X in Model 2. But Model 2 will have the full power of your sample. And model 1 will have the maximum power achievable with those variables.

Hi Clyde and anyone out there

I appended data from multiple datasets with similar variables and want to plot a graph for some variables over a few other. I also have a weight variable and a unique identifier for each variable contained in the dataset. All the variables are binary (0/1) except for the weight variable and unique identifier. The data is merged from 20 different sources representing 20 countries.

For one graph, I would like to plot oop_drugs oop_IP oop_OP over cata_nf_40 and cata_tot_10. For the other graph, I want to plot cata_nf_40 cata_tot_10 over hh_nexcap_quintile and hh_urban. I hope to have each indicator for all the countries in the dataset represented by an identifier (ID) which is string. For instance, for the ID variable, Uganda will be "UGA", Cambodia will be "KHM", etc. I also want to apply the population weights [popweight] and label the outputs.

Please how can I do this in Stata? Any ideas will be most appreciated!

find below and sample of my dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float popweight byte hh_nexpcap_quintile float(oop_drugs oop_OP oop_IP cata_nf_40 cata_tot_10 hh_urban) str3 ID
 8377 5 0 0 1 . 1 0 "UGA"
 8377 5 0 0 0 . 1 0 "UGA"
 8377 3 1 0 0 . 1 0 "UGA"
 9373 5 0 0 1 . 0 1 "UGA"
 3749 5 1 0 0 . 0 1 "UGA"
 1874 5 1 0 0 . 0 1 "UGA"
 7499 4 1 0 0 . 0 1 "UGA"
11248 5 1 0 0 . 0 1 "UGA"
11248 5 1 0 0 . 0 1 "UGA"
11248 4 1 0 0 . 0 1 "UGA"
 7499 5 0 0 1 . 0 1 "UGA"
11248 5 1 0 0 . 0 1 "UGA"
16873 5 0 1 0 . 0 1 "UGA"
16873 5 1 0 0 . 0 1 "UGA"
16873 5 0 0 1 . 0 1 "UGA"
 1874 5 0 1 0 . 0 1 "UGA"
 1874 5 1 0 0 . 0 1 "UGA"
 1874 5 0 0 1 . 0 1 "UGA"
 9373 5 1 0 0 . 0 1 "UGA"
 2499 4 0 0 1 . 0 1 "UGA"
 3124 5 1 0 0 . 0 1 "UGA"
 7499 5 1 0 0 . 1 1 "UGA"
11248 4 1 0 0 . 1 1 "UGA"
11248 4 0 0 1 . 1 1 "UGA"
 1874 5 1 0 0 . 1 1 "UGA"
 5624 5 0 1 0 . 1 1 "UGA"
 5624 5 1 0 0 . 1 1 "UGA"
11248 5 1 0 0 . 1 1 "UGA"
11248 5 0 0 1 . 1 1 "UGA"
 9373 4 1 0 0 . 0 1 "UGA"
11248 5 1 0 0 . 0 1 "UGA"
 4686 4 1 0 0 . 0 1 "UGA"
13123 5 1 0 0 . 0 1 "UGA"
 9373 5 1 0 0 . 0 1 "UGA"
 9373 5 0 0 1 . 0 1 "UGA"
 7499 5 1 0 0 . 0 1 "UGA"
 7499 5 0 0 1 . 0 1 "UGA"
20622 4 1 0 0 . 0 1 "UGA"
11248 4 1 0 0 . 0 1 "UGA"
11248 5 1 0 0 . 0 1 "UGA"
 3749 5 1 0 0 . 0 0 "UGA"
 7499 5 0 1 0 . 0 1 "UGA"
 7499 5 1 0 0 . 0 1 "UGA"
 7499 5 0 0 1 . 0 1 "UGA"
 7499 5 1 0 0 . 0 1 "UGA"
 1874 5 1 0 0 . 0 0 "UGA"
20622 5 0 1 0 . 0 1 "UGA"
20622 5 1 0 0 . 0 1 "UGA"
 2812 5 1 0 0 . 0 1 "UGA"
 7499 5 1 0 0 . 0 1 "UGA"
 2812 5 1 0 0 . 0 1 "UGA"
 5624 5 1 0 0 . 0 1 "UGA"
13123 4 0 1 0 . 0 1 "UGA"
13123 4 1 0 0 . 0 1 "UGA"
 3749 5 1 0 0 . 0 1 "UGA"
13123 5 1 0 0 . 1 1 "UGA"
13123 5 0 0 1 . 1 1 "UGA"
 5624 5 1 0 0 . 0 1 "UGA"
 9373 4 1 0 0 . 0 0 "UGA"
 9373 5 1 0 0 . 0 1 "UGA"
10206 4 1 0 0 . 0 0 "UGA"
25517 2 0 0 1 . 1 0 "UGA"
10206 5 1 0 0 . 0 0 "UGA"
 7655 3 1 0 0 . 0 0 "UGA"
 4252 4 1 0 0 . 1 0 "UGA"
10206 3 0 0 1 . 0 0 "UGA"
17861 4 0 0 1 . 0 0 "UGA"
15310 4 1 0 0 . 0 0 "UGA"
15310 4 0 0 0 . 0 0 "UGA"
10206 5 1 0 0 . 1 0 "UGA"
10206 5 0 0 0 . 1 0 "UGA"
15310 4 0 0 1 . 0 0 "UGA"
 8930 4 1 0 0 . 0 0 "UGA"
 1275 5 1 0 0 . 0 1 "UGA"
10206 3 1 0 0 . 1 0 "UGA"
 2551 3 1 0 0 . 1 0 "UGA"
 2551 3 0 0 1 . 0 0 "UGA"
15310 5 1 0 0 . 0 1 "UGA"
12758 3 0 0 1 . 0 1 "UGA"
15310 4 1 0 0 . 0 1 "UGA"
10206 4 1 0 0 . 0 0 "UGA"
17861 4 1 0 0 . 0 0 "UGA"
17861 4 0 0 1 . 0 0 "UGA"
10206 5 1 0 0 . 0 0 "UGA"
12758 4 0 0 1 . 0 0 "UGA"
11482 4 1 0 0 . 0 0 "UGA"
28068 5 0 0 1 . 0 0 "UGA"
15310 4 1 0 0 . 1 0 "UGA"
 7655 3 0 0 1 . 0 0 "UGA"
 1701 5 1 0 0 . 0 1 "UGA"
 7655 4 0 0 1 . 0 1 "UGA"
20413 1 1 0 0 . 0 0 "UGA"
 5103 4 1 0 0 . 0 0 "UGA"
 2551 4 0 0 1 . 1 0 "UGA"
 2551 4 0 0 0 . 1 0 "UGA"
 1275 5 1 0 0 . 0 0 "UGA"
 7655 5 1 0 0 . 0 1 "UGA"
30620 3 1 0 0 . 0 0 "UGA"
10206 5 0 0 1 . 0 1 "UGA"
10206 5 0 0 0 . 0 1 "UGA"
end
label values hh_nexpcap_quintile hh_nexpcap_quintile
label def hh_nexpcap_quintile 1 "poorest", modify
label def hh_nexpcap_quintile 2 "poorer", modify
label def hh_nexpcap_quintile 3 "middle", modify
label def hh_nexpcap_quintile 4 "richer", modify
label def hh_nexpcap_quintile 5 "richest", modify

Thanks again!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

07 Mar 2018, 14:20

While it is easy to think of the threads on this Forums as dialogs with one or a few people who respond, in fact there is a whole audience out there that reads along to learn about Stata and statistics. These threads are also an archive that people come her and search to find answers that may have already been uncovered for their problems. So that this remains a useful resource for both of those groups of people it is important that threads remain on topic.

This question has no real relationship to the original topic of this thread. Please repost as a New Topic. Thank you.
Comment
Michal Onah

Join Date: Apr 2016

Posts: 39
#10

07 Mar 2018, 14:23

Originally posted by Clyde Schechter View Post

While it is easy to think of the threads on this Forums as dialogs with one or a few people who respond, in fact there is a whole audience out there that reads along to learn about Stata and statistics. These threads are also an archive that people come her and search to find answers that may have already been uncovered for their problems. So that this remains a useful resource for both of those groups of people it is important that threads remain on topic.

This question has no real relationship to the original topic of this thread. Please repost as a New Topic. Thank you.

Hi Clyde,

Thanks for your response. I actually tried to post this as a new thread but for some reason, I cannot post a new thread.

I will I’ll continue to try and hopefully it will upload.

Thanks again

Mich
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

07 Mar 2018, 14:32

If you continue to have difficulties opening a new thread, click on Contact Us (lower right corner of this page) and send a message to the system administrator describing the problem you are encountering.
Comment
Michal Onah

Join Date: Apr 2016

Posts: 39
#12

07 Mar 2018, 14:36

Originally posted by Clyde Schechter View Post

If you continue to have difficulties opening a new thread, click on Contact Us (lower right corner of this page) and send a message to the system administrator describing the problem you are encountering.

Thank you,

I have emailed the system administrator
Comment

Announcement

How to analyze a subsample or should I use interactions?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment