Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random sample of variables for use in function

    I have a list of 120 variables (and tens of thousands of observations), and I would like to draw twelve random samples (without replacement) of the VARIABLE values. For each distinct set of ten I'll generate a new variable (so twelve in all) using a function operating on the ten variables for all observations. I can perform most of these tasks without assistance, but I can't seem to determine in Stata how to draw a random sample of variables without replacement that I may then reference in a functional expression. Thanks in advance for your help.

  • #2
    https://blog.stata.com/2012/08/03/us...t-replacement/
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks, Martin, but the blog post to which you refer appears to concern drawing a sample of observations, using various random number tools to sort observations based on the replacement conditions. In my case, I am seeking to use ALL observations but to sample randomly from VARIABLES in order to generate new functions using these randomly sampled variables.

      Comment


      • #4
        We can apply the technique in the blog post to your problem. The example code below assumes you have 10 variables and want to draw 3 random samples of 5 variables per sample. It assumes you are using Stata 16 or later so the frames command is available.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(id x101 x102 x103 x104 x105 x106 x107 x108 x109 x110)
        1 42 42 42 42 42 42 42 42 42 42
        end
        
        unab varsall : x101-x110
        local  nvars : word count `varsall'
        local  nvsel   5
        local  ndraw   3
        
        frame create seq
        frame change seq
        
        set obs `nvars'
        generate str8 varname = ""
        local i = 0
        foreach vn of local varsall {
            quietly replace varname = "`vn'" in `++i'
        }
        
        set seed 666
        forvalues l=1/`ndraw' {
            sort varname
            generate double u = runiform()
            sort u
            drop u
            local vars
            forvalues i=1/`nvsel' {
                local vars `vars' `=varname[`i']'
            }
            local varlist_`l' `vars'
        }
        
        frame change default
        
        display "`varlist_1'"
        display "`varlist_2'"
        display "`varlist_3'"
        Code:
        . display "`varlist_1'"
        x104 x101 x107 x103 x108
        
        . display "`varlist_2'"
        x105 x104 x102 x101 x110
        
        . display "`varlist_3'"
        x108 x105 x109 x110 x104
        Last edited by William Lisowski; 29 Mar 2022, 14:48.

        Comment


        • #5
          Many thanks, William. In the interim, I exported my file as a .csv and handled the randomization in another program before reimporting to Stata. However, the code you've provided will be very helpful going forward for future analysis. Thanks again.

          Comment

          Working...
          X