I am doing a research with a number of sample apparently exceeding 4000.
Trying to assess normality:
as stated in the note, it said that the normality assessed here may not be representing true normality as my sample size is 4323 and currently shapiro-wilk is only for 2000. Similarly with
Secondly, I would like to ask a technical question. I am really sorry I know that STATA is a software and statistical interpretation is derived from the researcher itself. If I have:
How do I know that 0.000 truly represents the result of our variable, given that the proportion between population with diabetes and without diabetes is totally imbalanced?
Thank you very much
Trying to assess normality:
Code:
swilk currentsmoker cigsperday prevalentstroke totchol bmi glucose Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ currentsmo~r | 4,238 0.99997 0.063 -7.221 1.00000 cigsperday | 4,209 0.95455 105.598 12.160 0.00000 prevalents~e | 4,238 0.92485 175.659 13.491 0.00000 totchol | 4,188 0.96867 72.464 11.176 0.00000 bmi | 4,219 0.95759 98.739 11.986 0.00000 glucose | 3,850 0.56337 936.450 17.804 0.00000 Note: The normal approximation to the sampling distribution of W' is valid for 4<=n<=2000.
Code:
sktest
Secondly, I would like to ask a technical question. I am really sorry I know that STATA is a software and statistical interpretation is derived from the researcher itself. If I have:
Variable | Coronary Disease | |||
Yes | No | pvalue | OR | |
Have diabetes | 40 | 69 | 0.000 | 3.3 |
Dont have diabetes | 604 | 3525 |
Thank you very much
Comment