Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to analyze a subsample or should I use interactions?

    Hi All,

    I am using Stata14 to run a regression on gender differences in in decision to seek care and utilisation of different health services. I have a sample size of 411 households. For the outcome variables, I have a variable called "decision to seek care" which is binary (seek care when sick/not seek care when sick) and utilisation of health facility (formal healthcare/informal healthcare). Then I have a couple of independent variables including sex of household head (male/female), household decisionmaking (sole by household head/joint by head and spouse/spouse only/others), income earning power (head earns more than spouse/spouse earns more than head/spouse and head earn approx the same/spouse earns no income), marital status (single/married/divorced/widowed), gender of sick member (male/female), etc.


    I am interested in examining how gender of sick member, household earning and decisionmaking gender differences, and other variables determine the outcome variables seeking care when sick, and using formal/informal care. The decisionmaking variables are only important when households are headed by a married/live-in individual. How do I perform the regression analyses since only a subsample of households are married. The figures below provide an example of the distribution of my variable across marital status and household dynamics. To avoid looking at only married/divorced households (n=223), how else can I run the regression analyses? Is there a way to use interactions between marital status and hh_earnpower and hh_desmaker? I am not sure since all the interactions will not provide a useful information I believe.

    Your advice would be most appreciated!

    The code I have used is
    Code:
    *regression for facility type
    logistic facility_type gender_hhead i.hh_earnpower i.hh_desmaker gender_sick cost_care no_adltmale no_adultfmle hhead_emplystat i.head_edulvl i.marital_stat
    
    *regression for sick and no care
    logistic sick_nocare gender_hhead i.hh_earnpower i.hh_desmaker gender_sick cost_care no_adltmale no_adultfmle hhead_emplystat i.head_edulvl i.marital_stat
    Code:
     
    des maker in hhold on monetary expenditure
    marital status respondent husband/w jointly others Total
    never married 26 1 0 7 34
    living with spouse 75 51 89 0 215
    widowed 148 0 4 2 154
    divorced/separated 8 0 0 0 8
    Total 257 52 93 9 411
    Code:
     
    HH head earning power
    marital status More than Less than About the same Spouse earns no income Don’t know Total
    never married 0 0 0 0 34 34
    living with spouse 109 64 37 3 2 215
    widowed 0 0 0 0 154 154
    divorced/separated 1 0 0 2 5 8
    Total 110 64 37 5 195 411

  • #2
    If you have decided in advance that the decision making and earning power variables are only relevant in married households, but are (or you are investigating whether they are) relevant in married households, then it makes no sense to analyze the sample as a whole. You would simply do separate regressions for married households and other households, and there would be no role for interaction terms, or for these variables at all, in the non-married households model.

    This seems to be what you are saying, and, in fact, I have the sense that your data are set up in such a way that your decision making and earning power variables aren't even defined for non-married households, though you do not say that in so many words. If I have that right, then these variables are just coded as missing values in any observations on non-married households. (Or they should be.) In that case, if you set up a model that includes these variables and also interacts them with sex (or anything else for that matter), when Stata assembles the estimation sample, it will include only married households, because any observation that contains a missing value on any regression variable is excluded from estimation in all Stata regression commands.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      If you have decided in advance that the decision making and earning power variables are only relevant in married households, but are (or you are investigating whether they are) relevant in married households, then it makes no sense to analyze the sample as a whole. You would simply do separate regressions for married households and other households, and there would be no role for interaction terms, or for these variables at all, in the non-married households model.

      This seems to be what you are saying, and, in fact, I have the sense that your data are set up in such a way that your decision making and earning power variables aren't even defined for non-married households, though you do not say that in so many words. If I have that right, then these variables are just coded as missing values in any observations on non-married households. (Or they should be.) In that case, if you set up a model that includes these variables and also interacts them with sex (or anything else for that matter), when Stata assembles the estimation sample, it will include only married households, because any observation that contains a missing value on any regression variable is excluded from estimation in all Stata regression commands.
      Thanks Clyde!

      You are very correct in your assertions. The household decision making and earning power variables are only relevant for households that are married.

      Would you suggest I build two models for married households and for “other” households? This way, I can include only relevant variables in the married vs unmarried households? Or alternatively, I can have one model for all households where Stata will exclude missing information? Although the research question seeks to looK at the effects of household-level dynamics on decision to seek care, I would go with the former. Gender of sick member and other gender variables might be important to examine for all the households.


      Thanks again!

      Mich

      Comment


      • #4
        I think you need two separate models here. In fact, if your decision making and earning power variables are coded as missing values in the non-married household observations, it won't be possible to extend an analysis that mentions them to include non-married households. So I think your hand is forced.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          I think you need two separate models here. In fact, if your decision making and earning power variables are coded as missing values in the non-married household observations, it won't be possible to extend an analysis that mentions them to include non-married households. So I think your hand is forced.
          Thanks again for the swift response.

          If I build two seperate models, do I need to worry about the magnitude of odds ratios detectable by the power of the independent variables in the subsample and if the subsample is enough to make inferences? If yes, how would you suggest I do this.

          Apologies if the questions are rudimentary.

          Comment


          • #6
            Yes, the subsamples will result in lower statistical power than the full sample. And, because the models are different, you cannot compare the coefficient (or odds ratio) of X in one model with the coefficient of X in the other. If this is a concern to you, you might consider two models:

            Model 1: Includes earning power and decision making variables and is estimated only on married households.
            Model 2: Excludes earning power and decision making variables and is estimated on the full data sample.

            Again, you still can't do cross-comparisons between effects of X in Model 1 with effects of X in Model 2. But Model 2 will have the full power of your sample. And model 1 will have the maximum power achievable with those variables.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Yes, the subsamples will result in lower statistical power than the full sample. And, because the models are different, you cannot compare the coefficient (or odds ratio) of X in one model with the coefficient of X in the other. If this is a concern to you, you might consider two models:

              Model 1: Includes earning power and decision making variables and is estimated only on married households.
              Model 2: Excludes earning power and decision making variables and is estimated on the full data sample.

              Again, you still can't do cross-comparisons between effects of X in Model 1 with effects of X in Model 2. But Model 2 will have the full power of your sample. And model 1 will have the maximum power achievable with those variables.
              Thanks Clyde. Your comments have been most useful!

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Yes, the subsamples will result in lower statistical power than the full sample. And, because the models are different, you cannot compare the coefficient (or odds ratio) of X in one model with the coefficient of X in the other. If this is a concern to you, you might consider two models:

                Model 1: Includes earning power and decision making variables and is estimated only on married households.
                Model 2: Excludes earning power and decision making variables and is estimated on the full data sample.

                Again, you still can't do cross-comparisons between effects of X in Model 1 with effects of X in Model 2. But Model 2 will have the full power of your sample. And model 1 will have the maximum power achievable with those variables.
                Hi Clyde and anyone out there

                I appended data from multiple datasets with similar variables and want to plot a graph for some variables over a few other. I also have a weight variable and a unique identifier for each variable contained in the dataset. All the variables are binary (0/1) except for the weight variable and unique identifier. The data is merged from 20 different sources representing 20 countries.

                For one graph, I would like to plot oop_drugs oop_IP oop_OP over cata_nf_40 and cata_tot_10. For the other graph, I want to plot cata_nf_40 cata_tot_10 over hh_nexcap_quintile and hh_urban. I hope to have each indicator for all the countries in the dataset represented by an identifier (ID) which is string. For instance, for the ID variable, Uganda will be "UGA", Cambodia will be "KHM", etc. I also want to apply the population weights [popweight] and label the outputs.

                Please how can I do this in Stata? Any ideas will be most appreciated!

                find below and sample of my dataset

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float popweight byte hh_nexpcap_quintile float(oop_drugs oop_OP oop_IP cata_nf_40 cata_tot_10 hh_urban) str3 ID
                 8377 5 0 0 1 . 1 0 "UGA"
                 8377 5 0 0 0 . 1 0 "UGA"
                 8377 3 1 0 0 . 1 0 "UGA"
                 9373 5 0 0 1 . 0 1 "UGA"
                 3749 5 1 0 0 . 0 1 "UGA"
                 1874 5 1 0 0 . 0 1 "UGA"
                 7499 4 1 0 0 . 0 1 "UGA"
                11248 5 1 0 0 . 0 1 "UGA"
                11248 5 1 0 0 . 0 1 "UGA"
                11248 4 1 0 0 . 0 1 "UGA"
                 7499 5 0 0 1 . 0 1 "UGA"
                11248 5 1 0 0 . 0 1 "UGA"
                16873 5 0 1 0 . 0 1 "UGA"
                16873 5 1 0 0 . 0 1 "UGA"
                16873 5 0 0 1 . 0 1 "UGA"
                 1874 5 0 1 0 . 0 1 "UGA"
                 1874 5 1 0 0 . 0 1 "UGA"
                 1874 5 0 0 1 . 0 1 "UGA"
                 9373 5 1 0 0 . 0 1 "UGA"
                 2499 4 0 0 1 . 0 1 "UGA"
                 3124 5 1 0 0 . 0 1 "UGA"
                 7499 5 1 0 0 . 1 1 "UGA"
                11248 4 1 0 0 . 1 1 "UGA"
                11248 4 0 0 1 . 1 1 "UGA"
                 1874 5 1 0 0 . 1 1 "UGA"
                 5624 5 0 1 0 . 1 1 "UGA"
                 5624 5 1 0 0 . 1 1 "UGA"
                11248 5 1 0 0 . 1 1 "UGA"
                11248 5 0 0 1 . 1 1 "UGA"
                 9373 4 1 0 0 . 0 1 "UGA"
                11248 5 1 0 0 . 0 1 "UGA"
                 4686 4 1 0 0 . 0 1 "UGA"
                13123 5 1 0 0 . 0 1 "UGA"
                 9373 5 1 0 0 . 0 1 "UGA"
                 9373 5 0 0 1 . 0 1 "UGA"
                 7499 5 1 0 0 . 0 1 "UGA"
                 7499 5 0 0 1 . 0 1 "UGA"
                20622 4 1 0 0 . 0 1 "UGA"
                11248 4 1 0 0 . 0 1 "UGA"
                11248 5 1 0 0 . 0 1 "UGA"
                 3749 5 1 0 0 . 0 0 "UGA"
                 7499 5 0 1 0 . 0 1 "UGA"
                 7499 5 1 0 0 . 0 1 "UGA"
                 7499 5 0 0 1 . 0 1 "UGA"
                 7499 5 1 0 0 . 0 1 "UGA"
                 1874 5 1 0 0 . 0 0 "UGA"
                20622 5 0 1 0 . 0 1 "UGA"
                20622 5 1 0 0 . 0 1 "UGA"
                 2812 5 1 0 0 . 0 1 "UGA"
                 7499 5 1 0 0 . 0 1 "UGA"
                 2812 5 1 0 0 . 0 1 "UGA"
                 5624 5 1 0 0 . 0 1 "UGA"
                13123 4 0 1 0 . 0 1 "UGA"
                13123 4 1 0 0 . 0 1 "UGA"
                 3749 5 1 0 0 . 0 1 "UGA"
                13123 5 1 0 0 . 1 1 "UGA"
                13123 5 0 0 1 . 1 1 "UGA"
                 5624 5 1 0 0 . 0 1 "UGA"
                 9373 4 1 0 0 . 0 0 "UGA"
                 9373 5 1 0 0 . 0 1 "UGA"
                10206 4 1 0 0 . 0 0 "UGA"
                25517 2 0 0 1 . 1 0 "UGA"
                10206 5 1 0 0 . 0 0 "UGA"
                 7655 3 1 0 0 . 0 0 "UGA"
                 4252 4 1 0 0 . 1 0 "UGA"
                10206 3 0 0 1 . 0 0 "UGA"
                17861 4 0 0 1 . 0 0 "UGA"
                15310 4 1 0 0 . 0 0 "UGA"
                15310 4 0 0 0 . 0 0 "UGA"
                10206 5 1 0 0 . 1 0 "UGA"
                10206 5 0 0 0 . 1 0 "UGA"
                15310 4 0 0 1 . 0 0 "UGA"
                 8930 4 1 0 0 . 0 0 "UGA"
                 1275 5 1 0 0 . 0 1 "UGA"
                10206 3 1 0 0 . 1 0 "UGA"
                 2551 3 1 0 0 . 1 0 "UGA"
                 2551 3 0 0 1 . 0 0 "UGA"
                15310 5 1 0 0 . 0 1 "UGA"
                12758 3 0 0 1 . 0 1 "UGA"
                15310 4 1 0 0 . 0 1 "UGA"
                10206 4 1 0 0 . 0 0 "UGA"
                17861 4 1 0 0 . 0 0 "UGA"
                17861 4 0 0 1 . 0 0 "UGA"
                10206 5 1 0 0 . 0 0 "UGA"
                12758 4 0 0 1 . 0 0 "UGA"
                11482 4 1 0 0 . 0 0 "UGA"
                28068 5 0 0 1 . 0 0 "UGA"
                15310 4 1 0 0 . 1 0 "UGA"
                 7655 3 0 0 1 . 0 0 "UGA"
                 1701 5 1 0 0 . 0 1 "UGA"
                 7655 4 0 0 1 . 0 1 "UGA"
                20413 1 1 0 0 . 0 0 "UGA"
                 5103 4 1 0 0 . 0 0 "UGA"
                 2551 4 0 0 1 . 1 0 "UGA"
                 2551 4 0 0 0 . 1 0 "UGA"
                 1275 5 1 0 0 . 0 0 "UGA"
                 7655 5 1 0 0 . 0 1 "UGA"
                30620 3 1 0 0 . 0 0 "UGA"
                10206 5 0 0 1 . 0 1 "UGA"
                10206 5 0 0 0 . 0 1 "UGA"
                end
                label values hh_nexpcap_quintile hh_nexpcap_quintile
                label def hh_nexpcap_quintile 1 "poorest", modify
                label def hh_nexpcap_quintile 2 "poorer", modify
                label def hh_nexpcap_quintile 3 "middle", modify
                label def hh_nexpcap_quintile 4 "richer", modify
                label def hh_nexpcap_quintile 5 "richest", modify

                Thanks again!

                Comment


                • #9
                  While it is easy to think of the threads on this Forums as dialogs with one or a few people who respond, in fact there is a whole audience out there that reads along to learn about Stata and statistics. These threads are also an archive that people come her and search to find answers that may have already been uncovered for their problems. So that this remains a useful resource for both of those groups of people it is important that threads remain on topic.

                  This question has no real relationship to the original topic of this thread. Please repost as a New Topic. Thank you.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    While it is easy to think of the threads on this Forums as dialogs with one or a few people who respond, in fact there is a whole audience out there that reads along to learn about Stata and statistics. These threads are also an archive that people come her and search to find answers that may have already been uncovered for their problems. So that this remains a useful resource for both of those groups of people it is important that threads remain on topic.

                    This question has no real relationship to the original topic of this thread. Please repost as a New Topic. Thank you.

                    Hi Clyde,

                    Thanks for your response. I actually tried to post this as a new thread but for some reason, I cannot post a new thread.

                    I will I’ll continue to try and hopefully it will upload.

                    Thanks again

                    Mich

                    Comment


                    • #11
                      If you continue to have difficulties opening a new thread, click on Contact Us (lower right corner of this page) and send a message to the system administrator describing the problem you are encountering.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        If you continue to have difficulties opening a new thread, click on Contact Us (lower right corner of this page) and send a message to the system administrator describing the problem you are encountering.
                        Thank you,

                        I have emailed the system administrator

                        Comment

                        Working...
                        X