Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardisation of variables and differing standardised coefficients

    Hi all,

    for my bachelor thesis I want to find out individual factors that promote a (regular) participation in direct democratic votes in Switzerland between 1995 and 2015. For my analysis, I use the Swiss national election studies, cumulated file 1971-2015 dataset and Stata version 15.1. I have four hypothesis that I want to test independently from each other.

    To approach to my research question I want to run a bivariate regression for the year.
    My dependent variable prdd is participation rate in direct democratic votes (a scale from 0 to 1, with decimal values: 0, 0.1, 0.2,..1)

    In hypothesis 1 I treat overall trust in instutions of the representative system (trustindex) as the explaining variable,

    I combine three variables of the survey (trust in parliament, trust in federal council, trust in parties - each on a scale from 0=no trust to 10=full trust) into one overall trust variable by using the command

    egen trustindex = rowmean (tparty tparl tcoun) and finally replace all the countless values of trustindex in a scale from 0 to 1 with 11 values (equivalent to scale of dependent variable).

    I control for:
    - age (18-97)
    - education (scale from 1-9, increasing educational degrees: from 1==primary education up to 9==univerisity)
    - year (1995, 1999, 2003, 2007, 2011 - with 1995 as my base level)


    Now, when trying to run regression I have some problems or questions:

    Since my independet variables have different scales than my dependent variable (apart from trustindex) I though it might be necessary to standardise.

    Is it correct that I do not need to standardise trustindex since it already has the same scale as my dependent variable?
    Do I need to standardise my controlling variable "year" if treat it as a categorial variable and if I just want to see the differences in participation compared to my base level 1995?


    I found two ways to standardise variables, which are supposed to lead to same results:
    a) a regression using the [, beta] command or
    b) pre-standardize each variable using the "egen [newvarname] = std(varname)" command

    After running several different regression with the beta command or pre-standardised variables I come to the conclusion that beta-coefficients just do not match my coefficients in regressions with pre-stadardised variables:


    Using the beta command
    reg prdd trustindex age educ i.year, beta


    Source | SS df MS Number of obs = 20,431
    -------------+---------------------------------- F(7, 20423) = 355.58
    Model | 193.601785 7 27.6573979 Prob > F = 0.0000
    Residual | 1588.51339 20,423 .07778061 R-squared = 0.1086
    -------------+---------------------------------- Adj R-squared = 0.1083
    Total | 1782.11518 20,430 .087230307 Root MSE = .27889

    ------------------------------------------------------------------------------
    prdd | Coef. Std. Err. t P>|t| Beta
    -------------+----------------------------------------------------------------
    trustindex | .2983321 .0107972 27.63 0.000 .1848966
    age | .0034016 .0001179 28.85 0.000 .194637
    educ | .0232571 .0009122 25.50 0.000 .1708613

    year |
    1999 | -.049819 .0062391 -7.99 0.000 -.0586392
    2003 | -.031216 .0052029 -6.00 0.000 -.046019
    2007 | -.0382453 .0056663 -6.75 0.000 -.0509464
    2011 | .0332743 .007754 4.29 0.000 .0306206
    |
    _cons | .2985779 .0103352 28.89 0.000 .




    With pre-standardised variables
    reg prdd trustindex agestd educstd i.year


    Source | SS df MS Number of obs = 20,431
    -------------+---------------------------------- F(7, 20423) = 355.58
    Model | 193.601786 7 27.6573979 Prob > F = 0.0000
    Residual | 1588.51339 20,423 .07778061 R-squared = 0.1086
    -------------+---------------------------------- Adj R-squared = 0.1083
    Total | 1782.11518 20,430 .087230307 Root MSE = .27889

    ------------------------------------------------------------------------------
    prdd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    trustindex | .2983321 .0107972 27.63 0.000 .2771688 .3194954
    agestd | .0592009 .0020523 28.85 0.000 .0551782 .0632237
    educstd | .052088 .002043 25.50 0.000 .0480836 .0560923
    |
    year |
    1999 | -.049819 .0062391 -7.99 0.000 -.0620481 -.03759
    2003 | -.031216 .0052029 -6.00 0.000 -.0414142 -.0210178
    2007 | -.0382453 .0056663 -6.75 0.000 -.0493518 -.0271388
    2011 | .0332743 .007754 4.29 0.000 .0180758 .0484728
    |
    _cons | .5886924 .0076708 76.74 0.000 .573657 .6037279



    And even if I standardise the trustindex, which I thought migth be the reason for the differences, standardised coefficients still differ:
    reg prdd trustindexstd agestd educstd i.year


    Source | SS df MS Number of obs = 20,431
    -------------+---------------------------------- F(7, 20423) = 355.58
    Model | 193.601786 7 27.657398 Prob > F = 0.0000
    Residual | 1588.51339 20,423 .07778061 R-squared = 0.1086
    -------------+---------------------------------- Adj R-squared = 0.1083
    Total | 1782.11518 20,430 .087230307 Root MSE = .27889

    -------------------------------------------------------------------------------
    prdd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    trustindexstd | .0546088 .0019764 27.63 0.000 .0507349 .0584827
    agestd | .0592009 .0020523 28.85 0.000 .0551782 .0632237
    educstd | .052088 .002043 25.50 0.000 .0480836 .0560923
    |
    year |
    1999 | -.049819 .0062391 -7.99 0.000 -.0620481 -.03759
    2003 | -.031216 .0052029 -6.00 0.000 -.0414142 -.0210178
    2007 | -.0382453 .0056663 -6.75 0.000 -.0493518 -.0271388
    2011 | .0332743 .007754 4.29 0.000 .0180758 .0484728
    |
    _cons | .7672332 .0034341 223.42 0.000 .7605021 .7739643



    How is it possible that my standardised beta-coefficient of age is .194637 while it is .0592009 when pre-standardising the varible. What are the more precise and reliable results? Or what did I do wrong?

    If any unclearities remain, please ask. I hope someone can help. Thanks in advance
    Rebecca



  • #2
    Standardization changes where the zero point is (which influences coefficients for interactions) and the variance of your variables which influence the parameters themselves. However, it should not change the parameter significance as is shown in your results - the p values are the same standardized and unstandardized. When you standardize, what had been a change of 5 in the original data becomes something different in the standardized data. So, the coefficient on that change of 5 will be different in the original and standardized data. The same thing happens any time you rescale a variable - if I divide x by 10, my parameters become 10 times as large.

    Neither result is more precise or reliable. You've just changed the variance of the variables which changes the parameter.

    The choice of standardizing has been extensively discussed on this listserve. Some on the listserve argue that standardizing is not a good idea if the original variables were meaningful. I know what an additional dollar is but once you standardize it I don't know what a standard deviation in dollars is.

    Comment

    Working...
    X