Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insignificant results: What can I do?

    Dear Statalistas,

    in my regression Growth per capita as percentage of GDP in 2014 is my dependent variable. I have quite some control variables (including log GDP of the previous year, fertility rate, tertiary education, life expectancy, urbanization rate, inflation, population aged <15, population aged +15, ratio of foreign investments to GDP and ratio of government spending to GDP) but my results are not significant. What can I do? I do not have the option to include more observations.

    Best
    Leo

    reg Growth Log_GDP_pC_2013 Inflation Foreign_Investments_ofGDP GovernmentSpending_ofGDP Lif
    > eExpectancy Education_Tertiary Urbanizationrate Population65 FertilityRate

    Source | SS df MS Number of obs = 41
    -------------+---------------------------------- F(9, 31) = 3.06
    Model | 76.4974593 9 8.4997177 Prob > F = 0.0097
    Residual | 86.1361336 31 2.77858496 R-squared = 0.4704
    -------------+---------------------------------- Adj R-squared = 0.3166
    Total | 162.633593 40 4.06583982 Root MSE = 1.6669

    -------------------------------------------------------------------------------------------
    Growth | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    --------------------------+----------------------------------------------------------------
    Log_GDP_pC_2013 | .0394968 1.673602 0.02 0.981 -3.373837 3.452831
    Inflation | -.1872666 .129307 -1.45 0.158 -.4509899 .0764567
    Foreign_Investments_ofGDP | .0412949 .0406959 1.01 0.318 -.041705 .1242949
    GovernmentSpending_ofGDP | -.122471 .0886844 -1.38 0.177 -.303344 .0584019
    LifeExpectancy | .0001113 .0912691 0.00 0.999 -.1860332 .1862558
    Education_Tertiary | -.02221 .0276 -0.80 0.427 -.0785007 .0340807
    Urbanizationrate | -.048897 .0289234 -1.69 0.101 -.1078867 .0100926
    Population65 | -.0565843 .0557908 -1.01 0.318 -.1703704 .0572018
    FertilityRate | -.1037417 .7621697 -0.14 0.893 -1.658197 1.450714
    _cons | 9.167969 6.21429 1.48 0.150 -3.50616 21.8421
    ------------------------------------------------------------------------------------------



  • #2
    Much as I dislike focusing on statistical significance at all, you have overlooked the line in the output header that says Prob > F = 0.0097. So your overall model is "significant" and in fact accounts for 47% of the variance, which is quite hefty. You are disappointed, I imagine, because none of your individual predictors turns out to be "significant" in its own right. This kind of situation can arise because of high correlations among your predictor variables. So they are all competing with each other to explain outcome variance, and they are all sufficiently good competitors that the variance is being spread pretty evenly among them, and none stands out as "significant."

    So take a look at the correlation matrix for your predictors, or do it graphically with -graph matrix-. Then eliminate some of the redundant variables. I'm not an economist, so I'm going out on a limb here, but, for example, I would imagine that your government spending variable and the log gdp from 2013 are pretty redundant of each other. I would also expect that tertiary education and urbanization are pretty strongly correlated, and that both of those would be rather negatively correlated with fertility rate. But let the data guide you on picking out a model with fewer highly-correlated predictors and you will likely find some "significant" results.

    That said, I hope you realize that trying models out until you finally find a "significant" result is not science: it's just mining Type I errors, and if done knowingly, some consider it scientific misconduct. You really should go into these analyses with a pre-specified hypothesis. And if it doesn't pan out, then it doesn't pan out.

    Comment


    • #3
      Thank you for the answer.
      I do know that I cannot alter variables in order to find a significant result. However all of these are thought of as control variables because I actually want to measure the effect of religion, which is not yet included. That is why I first need a general model, which serves well for growth in order to determine the effect of my actual research question.

      Comment


      • #4
        Understood. Good luck. If the variables shown in #1 are just there to be adjusted for, then it really doesn't matter whether they are "significant" or not. The correlations among them in no way diminish the adjustment (control). If those variables not the actual effects of interest in your research, then you should just ignore their p-values anyway.

        Comment


        • #5
          Leo,

          Beyond the issue of multicolinearity, which, as Clyde advised, is not a problem for control variables, it appears you have a problem with power.

          In your example, you have only 41 cases with 9 control variables plus the religion variable you intend to add as an independent variable.

          You can run a power analysis to determine the sample size you would need by using Stata's -power rsquared- command as shown below. This assumes that:
          • the effect of adding religion the model will be to increase R-squared by at least 10% after the effects of the controls
          • you have 9 control variables plus your added religion independent variable
          • you want to have 80% power, and
          • your Type I error criterion is 5%.
          Code:
          . power rsquared 0.0, ncontrol(9) ntested(1) diff(.10)
          
          Performing iteration ...
          
          Estimated sample size for multiple linear regression
          F test for R2 testing subset of coefficients
          Ho: R2_F = R2_R  versus  Ha: R2_F != R2_R
          
          Study parameters:
          
                  alpha =    0.0500
                  power =    0.8000
                  delta =    0.1111
                   R2_R =    0.0000
                   R2_F =    0.1000
                R2_diff =    0.1000
               ncontrol =         9
                ntested =         1
          
          Estimated sample size:
          
                      N =        73
          If you cannot collect additional data, then you can calculate the power of the analysis with n = 41 as follows:
          Code:
          . power rsquared 0.0, n(41) ncontrol(9) ntested(1) diff(.10)
          
          Estimated power for multiple linear regression
          F test for R2 testing subset of coefficients
          Ho: R2_F = R2_R  versus  Ha: R2_F != R2_R
          
          Study parameters:
          
                  alpha =    0.0500
                      N =        41
                  delta =    0.1111
                   R2_R =    0.0000
                   R2_F =    0.1000
                R2_diff =    0.1000
               ncontrol =         9
                ntested =         1
          
          Estimated power:
          
                  power =    0.5422

          If you expect the effect of religion to be a change in R-squared of 5% or less, then the estimated power to find the effect at a statistically significant level would drop to less than 30%.

          Hope that helps as you work on your model.

          Good luck.

          Red Owl





          Comment

          Working...
          X