Hi all,
I was hoping someone could help me with the code to simulate some data. I am a professor and I want to simulate some data to teach students how to analyze epidemiological data- a cross-sectional study. I want to simulate a bivariate dependent variable (Disease variable: Y/N), a bivariate independent main effects variable (Exposure variable: Y/N), and 2-5 other covariates (e.g., gender, age, severity, etc). For the covariates, I would love to simulate so that at least one meets criteria for a confounder (i.e., risk factor for dependent variable and associated with independent variable) such that the student would need to include covariate in the final model to adjust. Once I have the dataset, the student would use Stata's logistic regression to analyze.
When simulating the data, I would want to indicate the risk (i.e., odds ratio) for the independent variable (which happens to be having a particular gene variant). For example, I would want to say for a particular student that the risk (OR) for independent variable is 4.0. For another student, it is 3.0. They will not know until they analyze.
Here are the variables I am considering:
Y= B0+ B1X1+B2X2+B3X3+B4X4+B5X5
I could then indicate which potential covariate is a confounder (and which are not)
I would also want to give the number of people with and without independent variable (sample size: 100=Yes and 110=No)
B1=Independent variable (e.g., OR=4.0 for one student and 3.0 for another)
B2=Gender (e.g., OR=1.1)- not a confounder for final model (i.e., not associated to outcome or main effect-B1)
B3=Covariate1 (e.g., OR=0.95)- not a confounder for final model (not associated to outcome or main effect-B1)
B4=Covariate2 (e.g., OR=2.0 and is related to B2 main effect: confounder)
B5=Covariate3 (e.g., OR=1.2 for each unit increase: not a confounder.
So, I would want to simulate the variables so that for one student the results would be:
Y= B0+ B1X1+B4X4 (where OR for B1=4 and OR for B4=2.0)
I hope that is clear.
Thank you in advance.
Gary Heiman
I was hoping someone could help me with the code to simulate some data. I am a professor and I want to simulate some data to teach students how to analyze epidemiological data- a cross-sectional study. I want to simulate a bivariate dependent variable (Disease variable: Y/N), a bivariate independent main effects variable (Exposure variable: Y/N), and 2-5 other covariates (e.g., gender, age, severity, etc). For the covariates, I would love to simulate so that at least one meets criteria for a confounder (i.e., risk factor for dependent variable and associated with independent variable) such that the student would need to include covariate in the final model to adjust. Once I have the dataset, the student would use Stata's logistic regression to analyze.
When simulating the data, I would want to indicate the risk (i.e., odds ratio) for the independent variable (which happens to be having a particular gene variant). For example, I would want to say for a particular student that the risk (OR) for independent variable is 4.0. For another student, it is 3.0. They will not know until they analyze.
Here are the variables I am considering:
- Dependent variable (disease/no disease): bivariate
- Independent variable (gene/no gene): bivariate (I would want to indicate sample size for each (e.g., 100 with gene and 110 without)
- Gender (male/female): bivariate (may or may not be confounder)
- Covariate1: bivariate (some other variable that unrelated to dependent variable)
- Covariate2: Ordinal [from 1-3: Mild (1), Medium (2), and Severe (3)]- may or may not be a confounder
- Covariate3: Continuous variable normally distributed (e.g., age- from 7-60)
Y= B0+ B1X1+B2X2+B3X3+B4X4+B5X5
I could then indicate which potential covariate is a confounder (and which are not)
I would also want to give the number of people with and without independent variable (sample size: 100=Yes and 110=No)
B1=Independent variable (e.g., OR=4.0 for one student and 3.0 for another)
B2=Gender (e.g., OR=1.1)- not a confounder for final model (i.e., not associated to outcome or main effect-B1)
B3=Covariate1 (e.g., OR=0.95)- not a confounder for final model (not associated to outcome or main effect-B1)
B4=Covariate2 (e.g., OR=2.0 and is related to B2 main effect: confounder)
B5=Covariate3 (e.g., OR=1.2 for each unit increase: not a confounder.
So, I would want to simulate the variables so that for one student the results would be:
Y= B0+ B1X1+B4X4 (where OR for B1=4 and OR for B4=2.0)
I hope that is clear.
Thank you in advance.
Gary Heiman
Comment