Dear Statalist,
As part of my independent control variables, I use the operating device, operating system, gender, and whether the instructions were clicked.
After consultation with my colleagues and the internet, I'm not so sure how many observations within each variable in my experiment I need. I have 238 participants and roughly 850 observations (after data-cleaning) (each individual participant participates in four rounds of decision-making).
Question 1: Can I use Tablet (N=10 observations) in my regression below and just not comment on the significance (because I have less than 30 observations), or am I required to drop it from the regression because I don't interpret the results? Same with gender - I have 12 observations without a gender, compared to men, they are 43 percentage points more likely to show behavior as measured by my dep var. Though I am more interested in the effect on women vs men (-13pp but no significance).
Do I need to recode female and not stated into not male or is it fine to leave as is?
Question 2: Apple versus an unknown operating system is omitted due to multicollinearity, does this mean I need to drop system as a control variable?
My rationale for including controls was: 1) perhaps the device and operating system change the participant's behavior, as some literature finds an effect of increased risk-taking (i.e., changed behavior) when retail investors trade on smartphones compared to desktop computers. 2) gender because I want to analyze whether there is a difference in their behavior, and 3) instructions: because perhaps more diligent people (as defined by reading the instructions during the experiment) behave differently than those who do not click on the instructions.
Question 3: Regressions without controls (including only T_C, exp, indep, year, confidence and round) sometimes have different N and clusters compared to regressions with controls (as some variables get dropped - as above with Apple): Can I still compare them? I.e., without controls, these factors have a statistical influence; including control changes the coefficients from the regression slightly (2-3 pp while maintaining significance), and now consulting the instructions is highly significant.
Thank you very much in advance!
I very much appreciate your help and comments!
As part of my independent control variables, I use the operating device, operating system, gender, and whether the instructions were clicked.
After consultation with my colleagues and the internet, I'm not so sure how many observations within each variable in my experiment I need. I have 238 participants and roughly 850 observations (after data-cleaning) (each individual participant participates in four rounds of decision-making).
Code:
. tab system DE01_PRV | Freq. Percent Cum. ------------+----------------------------------- 0 | 825 97.40 97.40 Android | 12 1.42 98.82 Apple | 10 1.18 100.00 ------------+----------------------------------- Total | 847 100.00 . tab device DE01_FmF | Freq. Percent Cum. ------------+----------------------------------- Computer | 776 91.62 91.62 Tablet | 10 1.18 92.80 Smartphone | 61 7.20 100.00 ------------+----------------------------------- Total | 847 100.00 . tab instruclick instruclick | Freq. Percent Cum. ------------+----------------------------------- 0 | 618 73.92 73.92 1 | 218 26.08 100.00 ------------+----------------------------------- Total | 836 100.00 . tab gender GE05 | Freq. Percent Cum. ------------+----------------------------------- male | 714 84.30 84.30 female | 121 14.29 98.58 not stated | 12 1.42 100.00 ------------+----------------------------------- Total | 847 100.00
Do I need to recode female and not stated into not male or is it fine to leave as is?
Code:
. regress behavior i.T_C i.exp i.indep i.year i.confidence i.round i.gender i.device i.system i.instr > uclick, vce(cluster ID) note: 2.system omitted because of collinearity. Linear regression Number of obs = 586 F(27, 230) = 11.86 Prob > F = 0.0000 R-squared = 0.1511 Root MSE = .47169 (Std. err. adjusted for 231 clusters in ID) ------------------------------------------------------------------------------- | Robust behavior | Coefficient std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- T_C | 2 | -.0002946 .0590163 -0.00 0.996 -.1165764 .1159872 | exp | 2 | .2807108 .200137 1.40 0.162 -.1136255 .6750471 3 | .2531873 .1738806 1.46 0.147 -.0894152 .5957898 4 | .291963 .1566806 1.86 0.064 -.0167498 .6006758 | 2.indep | -.0663562 .0961313 -0.69 0.491 -.2557668 .1230543 | year | 2021 | -.0300914 .07163 -0.42 0.675 -.1712263 .1110436 2020 | -.104248 .0978701 -1.07 0.288 -.2970845 .0885885 2019 | .0638098 .0715169 0.89 0.373 -.0771022 .2047218 2018 | -.0986539 .0727007 -1.36 0.176 -.2418984 .0445906 2017 | -.0526477 .0759911 -0.69 0.489 -.2023755 .09708 2016 | -.370949 .0932669 -3.98 0.000 -.5547158 -.1871821 2015 | -.1115331 .0842104 -1.32 0.187 -.2774556 .0543894 2014 | .0579834 .1116252 0.52 0.604 -.1619552 .2779221 2013 | -.0026666 .0713856 -0.04 0.970 -.1433199 .1379867 | confidence | 2 | -.0342457 .0417133 -0.82 0.413 -.1164348 .0479433 3 | .0463232 .0512147 0.90 0.367 -.0545867 .1472332 4 | -.1103279 .0489368 -2.25 0.025 -.2067498 -.0139061 | round | 2 | .0050149 .0445307 0.11 0.910 -.0827254 .0927553 3 | -.0003592 .0496134 -0.01 0.994 -.098114 .0973957 4 | .0531307 .0476901 1.11 0.266 -.0408345 .147096 | gender | female | -.1317427 .0815155 -1.62 0.107 -.2923552 .0288698 not stated | .4370247 .0508592 8.59 0.000 .3368151 .5372343 | device | Tablet | .5374121 .0783172 6.86 0.000 .3831012 .691723 Smartphone | -.2212038 .1358939 -1.63 0.105 -.4889598 .0465522 | system | Android | -.0551672 .2526156 -0.22 0.827 -.5529037 .4425693 Apple | 0 (omitted) | 1.instruclick | .15917 .0563434 2.82 0.005 .0481548 .2701853 _cons | .2354242 .2071774 1.14 0.257 -.172784 .6436324 -------------------------------------------------------------------------------
My rationale for including controls was: 1) perhaps the device and operating system change the participant's behavior, as some literature finds an effect of increased risk-taking (i.e., changed behavior) when retail investors trade on smartphones compared to desktop computers. 2) gender because I want to analyze whether there is a difference in their behavior, and 3) instructions: because perhaps more diligent people (as defined by reading the instructions during the experiment) behave differently than those who do not click on the instructions.
Question 3: Regressions without controls (including only T_C, exp, indep, year, confidence and round) sometimes have different N and clusters compared to regressions with controls (as some variables get dropped - as above with Apple): Can I still compare them? I.e., without controls, these factors have a statistical influence; including control changes the coefficients from the regression slightly (2-3 pp while maintaining significance), and now consulting the instructions is highly significant.
Thank you very much in advance!
I very much appreciate your help and comments!
Comment