Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulating data for students

    Hi all,

    I was hoping someone could help me with the code to simulate some data. I am a professor and I want to simulate some data to teach students how to analyze epidemiological data- a cross-sectional study. I want to simulate a bivariate dependent variable (Disease variable: Y/N), a bivariate independent main effects variable (Exposure variable: Y/N), and 2-5 other covariates (e.g., gender, age, severity, etc). For the covariates, I would love to simulate so that at least one meets criteria for a confounder (i.e., risk factor for dependent variable and associated with independent variable) such that the student would need to include covariate in the final model to adjust. Once I have the dataset, the student would use Stata's logistic regression to analyze.

    When simulating the data, I would want to indicate the risk (i.e., odds ratio) for the independent variable (which happens to be having a particular gene variant). For example, I would want to say for a particular student that the risk (OR) for independent variable is 4.0. For another student, it is 3.0. They will not know until they analyze.

    Here are the variables I am considering:
    1. Dependent variable (disease/no disease): bivariate
    2. Independent variable (gene/no gene): bivariate (I would want to indicate sample size for each (e.g., 100 with gene and 110 without)
    3. Gender (male/female): bivariate (may or may not be confounder)
    4. Covariate1: bivariate (some other variable that unrelated to dependent variable)
    5. Covariate2: Ordinal [from 1-3: Mild (1), Medium (2), and Severe (3)]- may or may not be a confounder
    6. Covariate3: Continuous variable normally distributed (e.g., age- from 7-60)
    For a particular student simulation, I would proved the odds ratio (or beta) for each variable:
    Y= B0+ B1X1+B2X2+B3X3+B4X4+B5X5
    I could then indicate which potential covariate is a confounder (and which are not)
    I would also want to give the number of people with and without independent variable (sample size: 100=Yes and 110=No)

    B1=Independent variable (e.g., OR=4.0 for one student and 3.0 for another)

    B2=Gender (e.g., OR=1.1)- not a confounder for final model (i.e., not associated to outcome or main effect-B1)

    B3=Covariate1 (e.g., OR=0.95)- not a confounder for final model (not associated to outcome or main effect-B1)

    B4=Covariate2 (e.g., OR=2.0 and is related to B2 main effect: confounder)

    B5=Covariate3 (e.g., OR=1.2 for each unit increase: not a confounder.

    So, I would want to simulate the variables so that for one student the results would be:
    Y= B0+ B1X1+B4X4 (where OR for B1=4 and OR for B4=2.0)

    I hope that is clear.

    Thank you in advance.

    Gary Heiman

  • #2
    You're asking for a lot. You can easily simulate data in Stata. First, set the number of observations using set obs. Then generate the variables using generate. Look at the functions available (help functions) to pick what you want. If you need a 0/1 variable, generate the continuous variable and then divide it. You create your equations with generate again using the variables you've generated.

    Comment


    • #3
      In 2008 I gave a talk at North American Stata Users' meeting where I discussed various tricks for doing simulations. The material is here: http://maartenbuis.nl/presentations/chicago08.html
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Fantastic stuff Maarten and very useful website - I am glad to have found it and thank you for sharing your knowledge.
        With regard to simulating data to test methods or for didactic purposes, if you had more examples of simulated data you might be willing to share, I would be most grateful - especially survival analysis.
        Thanks again.
        Stata BE ver 17
        MacOS Ventura

        Comment

        Working...
        X