Hello Clyde Schechter, Bruce Weaver, George Ford, lorenabarberia, and Noor Sethi,
I sincerely appreciate your support all the way of the data analysis for my PhD project. I will be further grateful to you if any of you help me with the following issue:
For my project I am intending to show the regional variability of mental health of Canadians by immigration status. Here, mental health is a measured by K6 mental distress scale, immigration status variable is categorized into Canadian Born, Recent Immigrant, and long residing immigrant. Finally, regions has been categorized as AC (Atlantic Canada), QC (Quebec), ON (Ontario), Prairies(Prairies), BC (British Columbia).
For the regression analysis I am running OLS regression with Mental Heath as the dependent variable and with the interaction term of immigration status and region variable.
The problem I am encountering is to show tri-variate results among mental health, immigration status and regions. If I categorized the mental health variable into three categories, Low (one std below the mean), High (one std above the mean), and moderate (in between), and run a tri-variate table, table (region immigration status) (mental health), some of the cell counts appears to be less than 5 that prevents me to get the results vatted out of the RDC (Research data centre) as according to the rules, cell count must be 5 or above).
Considering the problem, I am thinking to show mean distribution of mental health by region and immigration status, sort region immigrant/// by region immigrant: sum (mental health) to show mean and std for immigration status by region table. and then run another code mean mental_health over (region immigrant)/// marginsplot to show mean plot for mental health by region and immigration status. It will help me avoid the cell count issue of the tri-variate tabulation.
Do you think my approach of showing mean and std and mean plotting instead of tri-variate frequency table is ok or my project?
Thank you once again for your support.
Iqbal
I sincerely appreciate your support all the way of the data analysis for my PhD project. I will be further grateful to you if any of you help me with the following issue:
For my project I am intending to show the regional variability of mental health of Canadians by immigration status. Here, mental health is a measured by K6 mental distress scale, immigration status variable is categorized into Canadian Born, Recent Immigrant, and long residing immigrant. Finally, regions has been categorized as AC (Atlantic Canada), QC (Quebec), ON (Ontario), Prairies(Prairies), BC (British Columbia).
For the regression analysis I am running OLS regression with Mental Heath as the dependent variable and with the interaction term of immigration status and region variable.
The problem I am encountering is to show tri-variate results among mental health, immigration status and regions. If I categorized the mental health variable into three categories, Low (one std below the mean), High (one std above the mean), and moderate (in between), and run a tri-variate table, table (region immigration status) (mental health), some of the cell counts appears to be less than 5 that prevents me to get the results vatted out of the RDC (Research data centre) as according to the rules, cell count must be 5 or above).
Considering the problem, I am thinking to show mean distribution of mental health by region and immigration status, sort region immigrant/// by region immigrant: sum (mental health) to show mean and std for immigration status by region table. and then run another code mean mental_health over (region immigrant)/// marginsplot to show mean plot for mental health by region and immigration status. It will help me avoid the cell count issue of the tri-variate tabulation.
Do you think my approach of showing mean and std and mean plotting instead of tri-variate frequency table is ok or my project?
Thank you once again for your support.
Iqbal
Comment