Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with randomly scrambling the districts

    Dear all,

    I hope you are doing well. I wanted to run a placebo test where I rerun my main estimation by randomly scrambling the identity of my districts. A sample of my data is as follows:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 bench int yeardecision byte StateWins double NewJudges_TotalJudges
    "abohc" 2016 0 .8888888888888888
    "abohc" 2003 0                 0
    "abohc" 2011 0 .5555555555555556
    "abohc" 2011 1 .5555555555555556
    "abohc" 2011 0 .5555555555555556
    "abohc" 2012 0 .5555555555555556
    "abohc" 2009 1                 0
    "banhc" 1994 0                 0
    "banhc" 1994 0                 0
    "banhc" 2009 0                 0
    "banhc" 2003 0                 0
    "banhc" 2009 1                 0
    "banhc" 1994 0                 0
    "banhc" 1996 1                 0
    "dikhc" 1991 0                 0
    "dikhc" 2005 1                 0
    "hydhc" 1995 0                 0
    end

    Essentially, I want to assess if I in my main explanatory variable that varies by district-year is correlated with the dependent variable when the districts are swapped/scrambled e.g. district (bench) "abohc" becomes "banhc". I would want to estimate .

    Code:
     regress StateWins NewJudges_TotalJudges_Scrambled i.yeardecision i.bench
    I am doing this as a placebo test so ascertain that I picking up some trends on respective districts in my baseline specifiction:

    regress StateWins NewJudges_TotalJudges i.yeardecision i.bench, vce(cluster bench)

    I am not sure exactly how I would go in constructing a new NewJudges_TotalJudges variable where the bench is swapped along with the explanatory variable and then regressed on the originally ordered State Wins variable.

    I thought about using uniform distribution to tag randomly districts but I am not sure how I would go about creating first a swapped bench (district) variable and then estimating the effect of NewJudges_TotalJudges on State Wins.

    Your help in this regard will really be appreciated.
    Last edited by Roger More; 04 Jul 2019, 05:34.

  • #2
    The built-in procedure -permute- will repeatedly shuffle the values of a variable across observations, run an estimation procedure, and enable you to investigate the variation in a parameter estimate across those repetitions. It also reports a permutation test p-value s its main purpose. Would this suit your purposes? If not, explain further. If you go into further explanation, it help if you told us what your explanatory variable is, and what you mean by "the bench is swapped among with the explanatory variable." Note further that "i.bench" won't work because bench is a string variable, not compatible with factor variable notation.

    Comment


    • #3
      Dear Mike,

      Thank you for your reply. Apologies for the confusion. Let me explain it with a data example. Consider sample of 6 observations from my data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 bench int yeardecision byte StateWins double NewJudges_TotalJudges
      "abohc" 2011 0 .5555555555555556
      "abohc" 2010 0 .5555555555555556
      "abohc" 2009 1                 0
      "banhc" 2011 0                 0
      "banhc" 2010 0                 0
      "banhc" 2009 0                 0
      end


      What I mean by I want to switch the benches (aka districts) is the following:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 bench int yeardecision byte StateWins double NewJudges_TotalJudges
      "banhc" 2011 0                 0
      "banhc" 2010 0                 0
      "banhc" 2009 1                 0
      "abohc" 2011 0                 .5555555555555556
      "abohc" 2010 0                 .5555555555555556
      "abohc" 2009 0                 0
      end

      My explanatory variable is NewJudges_TotalJudges and I would like to create a new variable NewJudges_TotalJudges_Scrambled and estimate the following two equations:

      Code:
      encode  bench, generate(district_bench)
      regress StateWins NewJudges_TotalJudges i.yeardecision i.district_bench //this of course I can already estimate
      regress StateWins NewJudges_TotalJudges_Scrambled i.yeardecision i.district_bench
      Essentially, I with the swap the bench variable and let them have the same independent variable values (i.e. same NewJudges_totalJudges) but retain the dependent variable values as in the example above.

      How would I be able to do it in a large sample,where the number of observations might not be equal across district_bench? Permute seem to scramble all the values.

      Your help here will really be appreciated and hope my explanations have made the problem clearer.

      Cheers!
      Last edited by Roger More; 05 Jul 2019, 13:42.

      Comment


      • #4
        I take the meaning of "scrambled" here to be that the entire set of variables for a given observation be kept intact *except* for bench, which would be shuffled (randomly reordered) within the variable bench. Based on that, I facilitated a comparison of your two data sets with -list- after -sort yeardecision StateWins NewJudges_TotalJudges-, and display them as follows:
        Code:
         
              Original                                            "Scrambled"    
             +-----------------------------------------+        +-----------------------------------------+    
             | bench   yearde~n   StateW~s   NewJudg~s |        | bench   yearde~n   StateW~s   NewJudg~s |
          1. | banhc       2009          0           0 |     1. | abohc       2009          0           0 |
          2. | abohc       2009          1           0 |     2. | banhc       2009          1           0 |
          3. | banhc       2010          0           0 |     3. | banhc       2010          0           0 |
          4. | abohc       2010          0   .55555556 |     4. | abohc       2010          0   .55555556 |
          5. | banhc       2011          0           0 |     5. | banhc       2011          0           0 |
          6. | abohc       2011          0   .55555556 |     6. | abohc       2011          0   .55555556 |
             +-----------------------------------------+        +-----------------------------------------+
        From this, I see that the first and second observations of the "scrambled" listing are the same as the original except that they have different values of bench, while the other observations appear identical on all variables This is one possible occurrence that could result from "scramble" per the way I've defined it above. And, it would represent one particular permutation of bench, as would be produced by one repetition of -permute-; that is, keeping the whole list of variables of each observation together, but just shuffling one of them, is precisely what -permute- does.

        I'm confused by what you want, since you say you want to switch bench, but then you talk as if your NewJudge* variable is being scrambled. (NewJudge is being shuffled with respect to bench, but not with respect to the other variables in any of your "scrambled" observations.) Perhaps you want to shuffle NewJudge* and leave the rest of the variables together for a given observation? Perhaps you want to shuffle more than one variable? Maybe someone else will discover something different? I don't happen to be familiar with the denotative use of the terminology "placebo test" (apparently developed in econometrics), so a precise definition of it for me and others might help you get some help. There are all sorts of ways to shuffle variable values, built in and do-it-yourself, so that's not a problem.
        Last edited by Mike Lacy; 05 Jul 2019, 14:55.

        Comment


        • #5
          Dear MIke,

          Again apologies for the confusion and thanks for persevering with me here.

          I will try to explain further. I think your understanding is precisely right and what I want to do here is "shuffle NewJudge* and leave the rest of the variables together for a given observation" . Here I want to reshuffle NewJudge* variable only by randomly shuffling their bench names.

          So, say bench A has NewJudge observations 1, 1, 2, and bench B had New Judge observations 3, 3, 3.

          I would like to keep everything the same except make A have observations 3, 3, 3 while bench B would have observations 1, 1, 2. So, all variables will have same value/ordering as before except the New judge variable.

          That is what I mean when I say I want to create a new variable NewJudge_TotalJudges_Scrambled. I hope this has clarified?

          I am not sure how permute will do it, as one requires specifying expression list in the syntax, I unsuccessfully try the following:

          Code:
           permute NewJudges_TotalJudges : regress StateWins NewJudges_TotalJudges i.yeardecision i.district_bench
          Cheers and thank you again!
          Last edited by Roger More; 06 Jul 2019, 07:02.

          Comment


          • #6
            To create a shuffled version of one variable, a call to Mata is probably the most efficient way to to this:

            Code:
            gen NewJudges_TotalJudges_Scrambled = .
            mata: st_store(., "`NewJudges_TotalJudges_Scrambled", jumble(st_data(., "NewJudges_TotalJudges")))
            One pure Stata way to do this, which is not very time efficient, is:
            Code:
            set seed 36356
            gen tempid = _n
            preserve
            // Create a shuffled version of your variable
            keep tempid NewJudges_TotalJudges
            gen double rand = runiform()
            sort rand
            replace tempid = _n
            drop rand
            rename NewJudges_TotalJudges NewJudges_TotalJudges_Scrambled
            tempfile temp
            save `temp'
            restore
            // Merge shuffled variable onto original
            merge 1:1 tempid using `temp'
            drop tempid

            Comment


            • #7
              Thanks a lot, this is very helpful!

              Cheers!

              Comment


              • #8
                Roger More , Mike Lacy :

                Will the command shufflevar, contributed by @Gabriel Rossman, do what you want?
                https://ideas.repec.org/c/boc/bocode/s457116.html
                Last edited by paulvonhippel; 26 Feb 2022, 16:45.

                Comment

                Working...
                X