Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • inteff command using survey data

    Hello,

    I am using a survey data to run logit regression, and now I have some interaction terms, all binary terms. I read Norton and Ai's paper that using Inteff to correct coefficients and s.e. of interaction terms in nonlinear model. However I dont think Inteff works under svy prefix and if Inteff works for more than 2 interactions

    my problem is that I am using a survey data, so have survey command

    svyset psu [pweight=total_wt], strata(stratagrp3)

    then i run a logit regression on a female subsample.
    dependent variable "eversti3", a binary variable whether a person has been diagnosed STI or not.

    independent variables
    1) learnmostschool, a binary variable =1 if a person says his primary sexual knowledge is school
    2) learnmostpar, another binary variable if a person's primary sex knowledge is from parents (this is mutually exclusive with learnmostschool, and learnmostother is used as base, anyone selects one answer out of these options)
    3) bothparents, whether a person grew up with both parents
    4) 5) interaction terms: learnmostschool*bothparents and learnmostpar*bothparents
    6+) other social factors $X

    I want to see how growing up with both parents affect the impact of sex education received from school and parents

    so my regression is
    svy, subpop(if female): logit eversti3 learnmostschool learnmostpar bothparents learnmostschool*bothparents learnmostpar*bothparents $x

    then I want to use Inteff next,
    svy, subpop(if female ): inteff eversti3 learnmostschool learnmostpar bothparents learnmostschool*bothparents learnmostpar*bothparents $x


    then i got error that

    inteff is not supported by svy with vce(linearized); see help svy estimation for a list of Stata estimation commands that are supported by svy
    r(322);


    Any suggestion how I can run interactions in a survey data?
    Or any suggestion how i can twist my model to deal with heterogeneity is also much appreciated

    PS, I am a only master student trying to get my degree..... please don't tell me to code a new command....

    Many thanks,
    Michelle



  • #2
    Use factor-variable notation in your regression command, and then use -margins-.

    First you have to get rid of your three learnmost* indicator variables and start with a new variable, let's call it source_of_learning, coded 1 for other, 2, for parents, and 3 for school. And you have to get rid of your hand-coded interaction variable. Then you run this:

    Code:
    // REGRESSION
    svy, subpop(if female ): logit eversti3 i.(source_of_learning)##i.bothparents  $x
    
    // ADJUSTED PREDICTED VALUES OF eversti3 DISAGGREGATED
    // BY SOURCE OF  MOST LEARNING AND BOTH PARENTS
    margins source_of_learning#bothparents 
    
    // ADJUSTED EFFECTS OF HAVING BOTH PARENTS
    // DISAGGREGATED BY SOURCE OF MOST LEARNING
    margins source_of_learning, dydx(bothparents)
    
    // ADJUSTED EFFECTS OF SOURCE OF MOST LEARNING
    // DISAGGREGATED BY HAVING BOTH PARENTS
    margins bothparents, dydx(source_of_learning)

    Comment


    • #3
      thanks so much for your reply Clyde. After a whole day I think I finally understood your suggested commands. One last question:

      when I use this command

      margins r.source_of_learning, over(r.bothparents)

      to see if the given family structure (bothparents = 1 or 0), and compare the marginal effect between source 1 and 2, see if it is significant. i.e. Whether school sex ed program more effective to someone who grew up in a single family house comparing with someone grew up with both parents.The stata produced F test has taken into account of the nonlinear model, correct? So I can just read the Stata outcome to draw a conclusion?

      Michelle

      Comment


      • #4
        Yes.

        Comment


        • #5
          ohhhh, thank you thank you!

          Comment


          • #6
            Clyde, can i please add one more question on this note?

            I thought Ai and Norton's paper is talking about significance of the interaction term on the actual binary outcome, where the stata produced t ratio for the interaction term in


            svy, subpop(if female ): logit eversti3 i.(source_of_learning)##i.bothparents $x is the significant test on the latent everSTI3*. Does it mean it is still meaningful to me? so if i found the interaction term between source_of_learning = 2 and bothparents is negative and significant, can i still draw the conclusion that school sex education is more effective to someone grew up with both parents than to someone who grew up in a single parent house? What if the margins r.source_of_learning, over(r.bothparents) produce the a significant result when the logit interaction is insignificant? How can i interpret this? I am soooo struggling on this and afraid to write a wrong interpretation... I would really really appreciate your help. Many thanks, Michelle

            Comment


            • #7
              Well, it's complicated and you need to delve fairly deeply into the data to answer this.

              The complication arises from the fact that logit() is a non-linear function, and also from the fact that you are hung up on statistical significance. The logistic regression output gives you an estimate of the log odds. So you could properly interpret the interaction term in the way you describe in #6 if you buy into hypothesis testing here. The marginal effects, however, work in the probability metric. Those metrics behave differently in some circumstances.

              But, for clarity, let's take an extreme, admittedly unrealistic example. Suppose the probability of eventsi3 is 0.99. And suppose that the odds ratio for 2.source_of_learning is 3 and that for bothparents is 5 and that the interaction odds ratio is 0.1 (negative interaction coefficient here, OR < 1). Notice that these are huge odds ratios. Larger than one can typically hope to obtain in a real world study. Now let's look at how this plays out in predictive probabilities. If you calculate the base odds from p = 0.99, multiply by the corresponding odds ratios, and then back-transform to probabilities you get:

              P(eventsi3 | 2.source_of_learning = 0, both_parents = 0) = 0.99
              P(eventsi3 | 2.source_of_learning = 1, both_parents = 0) = 0.997
              P(eventsi3 | 2.source_of_learning = 0, both_parents = 1) = 0.998
              P(eventsi3 | 2.source_of_learning = 1, both_parents = 1) = 0.993

              So the marginal effects here are all very small, and they might (or might not, depending on sample size) be statistically significant.

              By contrast, if the base value of P(eventsi3) were 0.5 instead of 0.99, the other three probabilities would be 0.75, 0.83, and 0.6--which represent very large marginal effects even though the odds ratios are exactly the same.

              So, first of all, statistical significance is useless here. Statistical significance is used to test null hypotheses. In your setting, the null hypothesis is a straw man and rejecting it is uninformative. Nobody in his or her right mind would believe that any of the effects in your study are really zero. They could be small, even very small. But zero is more or less out of the question. A significant result therefore tells us only what we already knew. A non-significant result just tells us that our study was too small to give us useful estimates of how large the effects are.

              So the question then is what do our results really tell us about the effects. As you can see, we can get very different impressions depending on whether we look at it in the odds ratio measure of effect metric or the marginal effects metric. It really all depends on that base probability, and it also depends on what your underlying goals and purposes are. I think in most practical situations, small marginal effects would lead one to conclude that an intervention associated with them is not useful. But your study does not appear to be a test of an intervention. Rather, I assume, you have observational data and are trying to gain an understanding of what exposures influence your outcome. In that situation, the odds ratio provides a theoretical perspective that might be worth taking seriously, even if the associated marginal effects are small.

              My overall advice would be to look at the output of -margins source_of_learning#bothparents- and get a sense of just how different these four probabilities are. If they are large enough to matter practically, and your odds ratio is appreciable, then the two findings support each other and I would report them. If your odds ratio is negligibly small and so are the differences in the four probabilities, then, similarly, the findings support each other and you have a study that shows that the influences of these exposures on eventsi3 are small. If they disagree substantially in magnitude, one might end up with a conclusion that the influences are theoretically there, but that their actual impact on outcomes is negligible. That would be a reasonable conclusion under those circumstances. And again, I wouldn't even bother to look at the p-values. I would present the ORs and the predicted probabilities with their confidence intervals so that we have a sense of the basic estimates of these parameters and an interval quantifying the uncertainty in those estimates. The p-values just aren't helpful.

              Comment


              • #8
                Thank you for your very detailed explanation. I am trying hard on understanding it....

                could you please one last chance explain to me what is the different meanings between these two:

                margins bothparents, dydx(2.source_of_learning) post
                margins 2.source_of_learning#bothparents

                How do I explain the number 0.243 and -0.140

                and if i want to see if the difference between predictability on STI between someone who choose 2.source_of_learning with bothparents =1 and someone who choose 2.source_of_learning with bothparents = 0.

                which one shall I use?

                Thank you so much in advance as this is way beyond the scope of my research plan but this i am here, i have to figure this out ....


                Michelle


                Click image for larger version

Name:	margins for stata.png
Views:	1
Size:	154.5 KB
ID:	1402658



                Comment


                • #9
                  The 0.243 is the model-predicted probability of hetb416a when source_of_learning == 2 and bothparents == 0.

                  The -0.140 is the model predicted difference between the probability of hetb416a when source_of_learning == 2 and that probability when source_of_learning != 2 if bothparents == 0. (If bothparents == 1, then that difference is -0.086.)

                  All of these estimates are adjusted to the joint distribution of all other variables included in the model.

                  Comment


                  • #10
                    If

                    margins bothparents, dydx(2.source_of_learning) post

                    predicted the difference between source_of_learning == 2 vs source_of_learning !=2, not with source_of_learning == 1 which i thought it is, how about in the original logit regression, if the coefficient in front of 2.source_of_learning is negative, does it that mean 2.source_of_learning has a lower probability to outcome Y=1 comparing with 1.source_of_learning or with all source_of_learning !=2?


                    Comment


                    • #11
                      how about in the original logit regression, if the coefficient in front of 2.source_of_learning is negative, does it that mean 2.source_of_learning has a lower probability to outcome Y=1 comparing with 1.source_of_learning or with all source_of_learning !=2?
                      No,not necessarily. Because you have the interaction term in the model, the coefficient of 2.source_of_learning is no longer an estimate of the effect of source_of_learning == 2; instead it estimates the effect of source_of_learning == 2 conditional on bothparents == 0.

                      This is why I recommend working off of the -margins- results rather than the regression output in interaction models. In the regression output things are not what they seem to be. You can make the appropriate calculations from them to get the effects you are interested in, but it is confusing and error-prone. Better to let -margins- do it.

                      Comment


                      • #12
                        Hi Clyde, Thanks very much for you help. I have decided to use your proposed method. Though I am still not sure if can get it wrong.... so i run the logit model with interaction, use -margin- command on source_of_learning#bothparents. I contrast four predicted probabilities. However, do you know why I cannot test them? Is there a paper I can cite to show I cannot use formal test?

                        Last push on my dissertation. I hugely appreciate your help.

                        Comment


                        • #13
                          You can test them if you want. My point is that it's pointless and useless to do so, that the p-values don't tell you what you want to know, whereas effect estimates and confidence intervals do.. There is no highly specific reference on this. But, when you have time, I recommend you read the American Statistical Association's position paper on the misuse and overuse of p-values. You can find it, and a number of supporting papers, at http://dx.doi.org/10.1080/00031305.2016.1154108.

                          If you want to specifically run tests contrasting the marginal effects in the different categories, you can add the -pwcompare(effects)- option to your -margins- commands to get them. In my opinion it's a waste of time and pixels, but I know that in some circles a paper just doesn't feel complete without p-values.

                          Comment

                          Working...
                          X