Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using racial/ethnic demographics as the independent variables in multivariate linear regression?

    Hello! I am conducting multivariate linear regression using -regress- of Google trend interest (a 100-point scale) as a function of racial/ethnic breakdown of a state. I'm wondering if this is actually an appropriate approach to analyzing this data. The IVs are the proportion of White, Black, Asian, mixed race, Native American/Pacific Islander, etc residents in a state. The DV is -interest100-.

    Obviously, the proportion of each of these variables affects the others, raising concerns for multicollinearity. In the multivariate model, I excluded white race to try to mitigate this; I also used robust standard errors to try to account for heteroscedasticity and some outliers that I could not justify removing. However, is it appropriate to do multivariate regression at all? Some of the IVs with very small values have extremely high coefficients in the multivariate model; should I not include them?


    Thank you so much! Data below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float interest100 double(white black asian americanindian hispaniclatino mixedrace nativehawiianpacificislander)
    .86 .691 .268  .015 .007 .046 .018 .001
    .74 .653 .037  .065 .156 .073 .075 .014
    .93 .826 .052  .037 .053 .317 .029 .003
    .79  .79 .157  .017  .01 .078 .022 .004
    .91 .719 .065  .155 .016 .394  .04 .005
    .72 .869 .046  .035 .016 .218 .031 .002
    .85 .797 .122   .05 .006 .169 .025 .001
     .9 .692 .232  .041 .007 .096 .027 .001
    .82  .46  .46  .045 .006 .113 .029 .001
    .94 .773 .169   .03 .005 .264 .022 .001
    .95 .602 .326  .044 .005 .099 .022 .001
    .73 .255 .022  .376 .004 .107 .242 .101
    .66 .867 .022  .049 .018 .134  .04 .005
    .84 .768 .146  .059 .006 .175 .021 .001
    .81 .848 .099  .026 .004 .073 .022 .001
     .7 .906 .041  .027 .005 .063  .02 .002
    .69 .863 .061  .032 .012 .122 .031 .001
    .77 .875 .085  .016 .003 .039  .02 .001
     .8 .628 .328  .018 .008 .053 .018 .001
    .63 .944 .017  .013 .007 .018 .018    .
      1 .585 .311  .067 .006 .106 .029 .001
    .87 .806  .09  .072 .005 .124 .026 .001
    .86 .792 .141  .034 .007 .053 .025    .
     .7 .838  .07  .052 .014 .056 .026 .001
     .9 .591 .378  .011 .006 .034 .013 .001
    .75 .829 .118  .022 .006 .044 .024 .002
    .72 .889 .006  .009 .067 .041 .028 .001
    .69 .881 .052  .027 .015 .114 .023 .001
    .89 .739 .103  .087 .017 .292 .046 .008
     .7 .931 .018   .03 .003  .04 .018    .
    .96 .719 .151    .1 .006 .209 .023 .001
    .87 .819 .026  .018  .11 .493 .026 .002
    .96 .696 .176   .09  .01 .193 .027 .001
    .82 .706 .222  .032 .016 .098 .023 .001
     .8 .869 .034  .017 .056 .041 .023 .001
    .72 .817 .131 .0255 .003  .04 .024 .001
    .72  .74 .078  .024 .094 .111 .063 .002
    .68 .867 .022  .049 .018 .134  .04 .005
    .84 .816  .12  .038 .004 .078 .021 .001
    .78 .836 .085  .037 .011 .163 .029 .002
    .88 .686  .27  .018 .005  .06  .02 .001
     .7 .846 .023  .015  .09 .042 .025 .001
    .79 .784 .171   .02 .005 .057  .02 .001
    .94 .787 .129  .052  .01 .397 .021 .001
    .75 .906 .015  .027 .016 .144 .026 .011
    .69 .942 .014  .019 .004  .02  .02    .
    .91 .694 .199  .069 .005 .098 .032 .001
    .77 .785 .044  .096 .019  .13 .049 .008
    .71 .935 .036  .008 .003 .017 .018    .
    .66  .87 .067   .03 .012 .071  .02 .001
    .76 .925 .013  .011 .027 .101 .022 .001
    end

  • #2
    Those are proportions, so you would expect very high coefficients: the unit is going from a state with 0% americanindian to a state with a 100% americanindian. That is a huge change, so you would expect a huge impact. You can multiply all the proportions by a 100, to get the effect of a percentage point change.

    You can include those variables as long as you exclude one of these. These variables add up to 1. So with 8 proportions, if you know 7 proportions, you also know the 8th. So one proportion is redundant. The interpretation is a bit more complicated. You can look at the last section of http://www.maartenbuis.nl/publications/proportions4.pdf
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Detail: The term multivariate regression is best reserved for a model with two or more outcome variables, as handled with say mvreg.

      What you have is better described just as regression, or at most multiple regression. That multiple is slowly fading away, as harmless but pointless even when there are several predictors.

      Comment


      • #4
        Thanks so much, Maarten! I'll multiply the IVs by 100 as you suggested. Thank you for that correction, Nick!

        One of my co-investigators is very concerned that the sign of the coefficient for nativehawiianpacificislander changes from negative in the unadjusted model to positive in the fully adjusted model. He understands confounding, but is having a hard time wrapping his head around how confounding would work for these particular IVs. He's actually asking me if we can just present the results of the unadjusted models, but I think the adjusted model is still useful to include. Any tips on how to explain this to him?

        Comment


        • #5
          Sorry - Maarten, it looks like the article you linked answers this, so I'll refer to that. Thanks so much!

          Comment

          Working...
          X