Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulation of binomial distribution- Giving more structure to the simulated data

    Hi,

    I have data about students in schools for several years. I can identify the peers of non-Western origin and I would like to see the effect of exposure to a higher share of non-Western peers on the native students' outcomes.

    I would like to compare the kernel density plot of the standard deviations of the within-schools between-year share of non-Western peers once using actual data and once using simulated data.

    For the simulation, I use a binomial distribution that randomly assigns a non-Western indicator to the peers. I then calculate the share of non-Western peers within-schools and between-years using the simulated peers. I repeat this 1000 times.

    When I plot the kernel density of the standard deviations of actual share of non-Western peers and simulated share of non-Western peers, I can see that the right tail of my simulated plot is much longer than the actual data. Meaning that the standard deviation of the share of non-Western peers using simulated data has more outliers than in the standard deviation of the share of non-Western peers generated by actual data.

    Now my question is: How can I tell my simulation not to go higher than the max standard deviation of the actual share of peers?

    My code is as follows:
    Code:
    clear all
    
    cd "use"
    
    clear
    gen share_sd=.
    save mc_sd_empty.dta, replace
    
    use mc_data, clear
    
    bys school year: egen share_real=mean(non_western)
    bys school: egen overall_share=mean(non_western)
    
    *Drop the ones with zero variation in share of non westerns
    drop if overall_share=0
    drop overall_share
    
    *Calculating the sd from actual data, for natives
    
    keep if native==3
    bys school year: keep if _n==1
    
    *Gen sd in schools to later draw the kernel for
    bys school: egen share_sd=sd(share_real)
    
    keep share_sd
    
    save actual_sd.dta, replace
    
    *Prepare simulation data
    
    use mc_data, clear
    
    *drop the ones with zero variation in share of non-westerns
    bys school: egen overall_share=mean(non-western)
    drop if overall_share==0
    drop overall_share
    
    *Mean of nonwesterns in each school to use in simulation
    bys school: egen p_nonwestern=mean(non-western)
    
    
    save data_ready, replace
    
    **Program
    
    capture program drop mc
    program define mc, rclass
    
    use data_ready, clear
    
    *Randomly assign the peers an immigrant status based on binomial
    bys school: gen rand_cohort=rbinomial(1, p_nonwestern) if native!=3
    
    bys school year: egen share_simulated=mean(rand_cohort)
    
    keep if native==3
    bys school year: keep if _n==1
    bys school: egen share_sd=sd(share_simulated)
    
    keep share_sd
    append using mc_sd.dta
    save mc_sd.dta, replace
    end
    
    copy mc_sd_empty mc_sd, replace
    seed 1234
    simulate share_sd, reps(1000): mc
    
    
    use actual_sd, clear
    append using mc_sd, gen(simulated)
    
    twoway (kdensity share_sd if simulated==0) (kdensity share_sd if simulated==1)


    My data looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id year school native non_western)
     1 2001 100 1 0
     1 2002 100 1 0
     2 2001 100 0 1
     3 2004 101 1 0
     3 2005 101 1 0
     4 2001 100 1 0
     4 2002 100 1 0
     4 2003 100 1 0
     5 2004 101 0 1
     6 2005 101 1 0
     6 2006 101 1 0
     6 2007 101 1 0
     7 2002 100 1 0
     7 2003 100 1 0
     7 2004 100 1 0
     8 2002 100 0 1
     8 2003 100 0 1
     9 2005 101 0 0
    10 2005 101 1 0
    10 2006 101 1 0
    10 2007 101 1 0
    end
    Last edited by Neg Kha; 13 Nov 2024, 03:14.

  • #2
    might try shuffle_var

    Comment

    Working...
    X