Help with randomly scrambling the districts

Roger More

Join Date: Jul 2017

Posts: 59
#1

Help with randomly scrambling the districts

04 Jul 2019, 04:31

Dear all,

I hope you are doing well. I wanted to run a placebo test where I rerun my main estimation by randomly scrambling the identity of my districts. A sample of my data is as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 bench int yeardecision byte StateWins double NewJudges_TotalJudges "abohc" 2016 0 .8888888888888888 "abohc" 2003 0 0 "abohc" 2011 0 .5555555555555556 "abohc" 2011 1 .5555555555555556 "abohc" 2011 0 .5555555555555556 "abohc" 2012 0 .5555555555555556 "abohc" 2009 1 0 "banhc" 1994 0 0 "banhc" 1994 0 0 "banhc" 2009 0 0 "banhc" 2003 0 0 "banhc" 2009 1 0 "banhc" 1994 0 0 "banhc" 1996 1 0 "dikhc" 1991 0 0 "dikhc" 2005 1 0 "hydhc" 1995 0 0 end

Essentially, I want to assess if I in my main explanatory variable that varies by district-year is correlated with the dependent variable when the districts are swapped/scrambled e.g. district (bench) "abohc" becomes "banhc". I would want to estimate .

Code:

regress StateWins NewJudges_TotalJudges_Scrambled i.yeardecision i.bench

I am doing this as a placebo test so ascertain that I picking up some trends on respective districts in my baseline specifiction:

regress StateWins NewJudges_TotalJudges i.yeardecision i.bench, vce(cluster bench)

I am not sure exactly how I would go in constructing a new NewJudges_TotalJudges variable where the bench is swapped along with the explanatory variable and then regressed on the originally ordered State Wins variable.

I thought about using uniform distribution to tag randomly districts but I am not sure how I would go about creating first a swapped bench (district) variable and then estimating the effect of NewJudges_TotalJudges on State Wins.

Your help in this regard will really be appreciated.

Last edited by Roger More; 04 Jul 2019, 04:34.
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

04 Jul 2019, 10:24

The built-in procedure -permute- will repeatedly shuffle the values of a variable across observations, run an estimation procedure, and enable you to investigate the variation in a parameter estimate across those repetitions. It also reports a permutation test p-value s its main purpose. Would this suit your purposes? If not, explain further. If you go into further explanation, it help if you told us what your explanatory variable is, and what you mean by "the bench is swapped among with the explanatory variable." Note further that "i.bench" won't work because bench is a string variable, not compatible with factor variable notation.
Comment
Roger More

Join Date: Jul 2017

Posts: 59
#3

05 Jul 2019, 12:36

Dear Mike,

Thank you for your reply. Apologies for the confusion. Let me explain it with a data example. Consider sample of 6 observations from my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 bench int yeardecision byte StateWins double NewJudges_TotalJudges "abohc" 2011 0 .5555555555555556 "abohc" 2010 0 .5555555555555556 "abohc" 2009 1 0 "banhc" 2011 0 0 "banhc" 2010 0 0 "banhc" 2009 0 0 end

What I mean by I want to switch the benches (aka districts) is the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 bench int yeardecision byte StateWins double NewJudges_TotalJudges "banhc" 2011 0 0 "banhc" 2010 0 0 "banhc" 2009 1 0 "abohc" 2011 0 .5555555555555556 "abohc" 2010 0 .5555555555555556 "abohc" 2009 0 0 end

My explanatory variable is NewJudges_TotalJudges and I would like to create a new variable NewJudges_TotalJudges_Scrambled and estimate the following two equations:

Code:

encode bench, generate(district_bench) regress StateWins NewJudges_TotalJudges i.yeardecision i.district_bench //this of course I can already estimate regress StateWins NewJudges_TotalJudges_Scrambled i.yeardecision i.district_bench

Essentially, I with the swap the bench variable and let them have the same independent variable values (i.e. same NewJudges_totalJudges) but retain the dependent variable values as in the example above.

How would I be able to do it in a large sample,where the number of observations might not be equal across district_bench? Permute seem to scramble all the values.

Your help here will really be appreciated and hope my explanations have made the problem clearer.

Cheers!

Last edited by Roger More; 05 Jul 2019, 12:42.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

05 Jul 2019, 13:51

I take the meaning of "scrambled" here to be that the entire set of variables for a given observation be kept intact *except* for bench, which would be shuffled (randomly reordered) within the variable bench. Based on that, I facilitated a comparison of your two data sets with -list- after -sort yeardecision StateWins NewJudges_TotalJudges-, and display them as follows:

Code:

Original "Scrambled" +-----------------------------------------+ +-----------------------------------------+ | bench yearde~n StateW~s NewJudg~s | | bench yearde~n StateW~s NewJudg~s | 1. | banhc 2009 0 0 | 1. | abohc 2009 0 0 | 2. | abohc 2009 1 0 | 2. | banhc 2009 1 0 | 3. | banhc 2010 0 0 | 3. | banhc 2010 0 0 | 4. | abohc 2010 0 .55555556 | 4. | abohc 2010 0 .55555556 | 5. | banhc 2011 0 0 | 5. | banhc 2011 0 0 | 6. | abohc 2011 0 .55555556 | 6. | abohc 2011 0 .55555556 | +-----------------------------------------+ +-----------------------------------------+

From this, I see that the first and second observations of the "scrambled" listing are the same as the original except that they have different values of bench, while the other observations appear identical on all variables This is one possible occurrence that could result from "scramble" per the way I've defined it above. And, it would represent one particular permutation of bench, as would be produced by one repetition of -permute-; that is, keeping the whole list of variables of each observation together, but just shuffling one of them, is precisely what -permute- does.

I'm confused by what you want, since you say you want to switch bench, but then you talk as if your NewJudge* variable is being scrambled. (NewJudge is being shuffled with respect to bench, but not with respect to the other variables in any of your "scrambled" observations.) Perhaps you want to shuffle NewJudge* and leave the rest of the variables together for a given observation? Perhaps you want to shuffle more than one variable? Maybe someone else will discover something different? I don't happen to be familiar with the denotative use of the terminology "placebo test" (apparently developed in econometrics), so a precise definition of it for me and others might help you get some help. There are all sorts of ways to shuffle variable values, built in and do-it-yourself, so that's not a problem.

Last edited by Mike Lacy; 05 Jul 2019, 13:55.
Comment
Roger More

Join Date: Jul 2017

Posts: 59
#5

06 Jul 2019, 05:56

Dear MIke,

Again apologies for the confusion and thanks for persevering with me here.

I will try to explain further. I think your understanding is precisely right and what I want to do here is "shuffle NewJudge* and leave the rest of the variables together for a given observation" . Here I want to reshuffle NewJudge* variable only by randomly shuffling their bench names.

So, say bench A has NewJudge observations 1, 1, 2, and bench B had New Judge observations 3, 3, 3.

I would like to keep everything the same except make A have observations 3, 3, 3 while bench B would have observations 1, 1, 2. So, all variables will have same value/ordering as before except the New judge variable.

That is what I mean when I say I want to create a new variable NewJudge_TotalJudges_Scrambled. I hope this has clarified?

I am not sure how permute will do it, as one requires specifying expression list in the syntax, I unsuccessfully try the following:

Code:

permute NewJudges_TotalJudges : regress StateWins NewJudges_TotalJudges i.yeardecision i.district_bench

Cheers and thank you again!

Last edited by Roger More; 06 Jul 2019, 06:02.
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2416

06 Jul 2019, 07:26

To create a shuffled version of one variable, a call to Mata is probably the most efficient way to to this:

Code:

gen NewJudges_TotalJudges_Scrambled = .
mata: st_store(., "`NewJudges_TotalJudges_Scrambled", jumble(st_data(., "NewJudges_TotalJudges")))

One pure Stata way to do this, which is not very time efficient, is:

Code:

set seed 36356
gen tempid = _n
preserve
// Create a shuffled version of your variable
keep tempid NewJudges_TotalJudges
gen double rand = runiform()
sort rand
replace tempid = _n
drop rand
rename NewJudges_TotalJudges NewJudges_TotalJudges_Scrambled
tempfile temp
save `temp'
restore
// Merge shuffled variable onto original
merge 1:1 tempid using `temp'
drop tempid

Comment

Roger More

Join Date: Jul 2017

Posts: 59
#7

06 Jul 2019, 10:59

Thanks a lot, this is very helpful!

Cheers!
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#8

26 Feb 2022, 15:42

Roger More , Mike Lacy :

Will the command shufflevar, contributed by @Gabriel Rossman, do what you want?
https://ideas.repec.org/c/boc/bocode/s457116.html

Last edited by paulvonhippel; 26 Feb 2022, 15:45.
Comment

Announcement

Help with randomly scrambling the districts

Comment

Comment

Comment

Comment

Comment

Comment

Comment