Simulation of binomial distribution- Giving more structure to the simulated data

Neg Kha

Join Date: Jun 2022
Posts: 68

Simulation of binomial distribution- Giving more structure to the simulated data

13 Nov 2024, 03:10

Hi,

I have data about students in schools for several years. I can identify the peers of non-Western origin and I would like to see the effect of exposure to a higher share of non-Western peers on the native students' outcomes.

I would like to compare the kernel density plot of the standard deviations of the within-schools between-year share of non-Western peers once using actual data and once using simulated data.

For the simulation, I use a binomial distribution that randomly assigns a non-Western indicator to the peers. I then calculate the share of non-Western peers within-schools and between-years using the simulated peers. I repeat this 1000 times.

When I plot the kernel density of the standard deviations of actual share of non-Western peers and simulated share of non-Western peers, I can see that the right tail of my simulated plot is much longer than the actual data. Meaning that the standard deviation of the share of non-Western peers using simulated data has more outliers than in the standard deviation of the share of non-Western peers generated by actual data.

Now my question is: How can I tell my simulation not to go higher than the max standard deviation of the actual share of peers?

My code is as follows:

Code:

clear all

cd "use"

clear
gen share_sd=.
save mc_sd_empty.dta, replace

use mc_data, clear

bys school year: egen share_real=mean(non_western)
bys school: egen overall_share=mean(non_western)

*Drop the ones with zero variation in share of non westerns
drop if overall_share=0
drop overall_share

*Calculating the sd from actual data, for natives

keep if native==3
bys school year: keep if _n==1

*Gen sd in schools to later draw the kernel for
bys school: egen share_sd=sd(share_real)

keep share_sd

save actual_sd.dta, replace

*Prepare simulation data

use mc_data, clear

*drop the ones with zero variation in share of non-westerns
bys school: egen overall_share=mean(non-western)
drop if overall_share==0
drop overall_share

*Mean of nonwesterns in each school to use in simulation
bys school: egen p_nonwestern=mean(non-western)


save data_ready, replace

**Program

capture program drop mc
program define mc, rclass

use data_ready, clear

*Randomly assign the peers an immigrant status based on binomial
bys school: gen rand_cohort=rbinomial(1, p_nonwestern) if native!=3

bys school year: egen share_simulated=mean(rand_cohort)

keep if native==3
bys school year: keep if _n==1
bys school: egen share_sd=sd(share_simulated)

keep share_sd
append using mc_sd.dta
save mc_sd.dta, replace
end

copy mc_sd_empty mc_sd, replace
seed 1234
simulate share_sd, reps(1000): mc


use actual_sd, clear
append using mc_sd, gen(simulated)

twoway (kdensity share_sd if simulated==0) (kdensity share_sd if simulated==1)

My data looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id year school native non_western)
 1 2001 100 1 0
 1 2002 100 1 0
 2 2001 100 0 1
 3 2004 101 1 0
 3 2005 101 1 0
 4 2001 100 1 0
 4 2002 100 1 0
 4 2003 100 1 0
 5 2004 101 0 1
 6 2005 101 1 0
 6 2006 101 1 0
 6 2007 101 1 0
 7 2002 100 1 0
 7 2003 100 1 0
 7 2004 100 1 0
 8 2002 100 0 1
 8 2003 100 0 1
 9 2005 101 0 0
10 2005 101 1 0
10 2006 101 1 0
10 2007 101 1 0
end

Last edited by Neg Kha; 13 Nov 2024, 03:14.

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3025
#2

14 Nov 2024, 18:05

might try shuffle_var
Comment

Announcement

Simulation of binomial distribution- Giving more structure to the simulated data

Comment