Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping Confidence Interval

    I generate confidence intervals for the mean INCOME for each COUNTY separately for WHITES and BLACKS as shown here:

    Code:
    bysort county: ci mean income if race == 1, level(95)
    bysort county: ci mean income if race == 2, level(95)
    However, It is my understanding that I could generate bca's as shown here:

    Code:
    set seed a
    bootstrap _b, reps(1000) bca: mean income
    estat bootstrap

    However, how could I replicate what I showed in my first line of code? That is how could I generate bootstrapped Confidence Intervals for mean income in each county for whites and blacks separately?

  • #2
    Code:
    clear*
    
    // SHORT PROGRAM TO CALCULATE BCA CI's
    capture program drop myprogram
    program define myprogram
        bootstrap _b, reps(1000) bca: mean income
        estat bootstrap
        tempname matrix
        matrix `matrix' = e(ci_bca)
        gen ll = `matrix'[1, 1]
        gen ul = `matrix'[2, 1]
        exit
    end
    
    //    CREATE DEMONSTRATION DATA SET
    //    OF 10 COUNTIES, EACH HAVING 25 OF RACE 0 AND 75 OF RACE 1
    set seed 1234
    set obs 10
    gen int county = _n
    expand 100
    by county, sort: gen race = _n < 25
    gen income = rgamma(10, 5000)
    
    //    CALCULATE THE CONFIDENCE INTERVALS FOR EACH COUNTY & RACE
    runby myprogram, by(county race) status
    Note: You will need to install the new -runby- command, by Robert Picard and me, available from SSC.

    The logic of this code is that the bootstrap-adjusted confidence intervals are stored in the matrix e(ci_bca), so it is just necessary to extract those. You will find them stored as variables ll and ul in the data set after this runs. This is going to be pretty time consuming because 1000 reps of -mean- takes appreciable time even on a relatively small dataset. By using -runby- instead of some other constructs to loop over country and race, you at least reduce the looping overhead.

    I suggest that you first try this out on a small sample of your data, and add the -verbose- option to the -runby- command. That way you will see if there are any problems that I did not foresee when writing this code. Then, if you find none, or after fixing those you do find, run the whole thing without the -verbose- option.

    Comment


    • #3
      This is a great contribution. Is there a way to specify for example:
      Code:
       
       set seed a bysourt county bootstrap _b, reps(1000) bca: mean income if race == 1 estat bootstrap

      Comment


      • #4
        I don't understand what you are asking. The code you show is run together on a single line. And, in any case, what is your question about it?

        Comment


        • #5
          Is it possible to code this in the absence of writing a program?

          Comment


          • #6
            Yes. You can do it this way:

            Code:
            levelsof county, local(counties)
            levelsof race, local(races)
            tempname matrix
            gen ll = .
            gen ul = .
            foreach c of local counties {
                foreach r of local races {
                    bootstrap _b, reps(1000) bca: mean income if county == `c' & race == `r'
                    estat bootstrap
                    matrix `matrix' = e(ci_bca)
                    replace ll = `matrix'[1, 1] if county == `c' & race == `r'
                    replace ul = `matrix'[2, 1] if county == `c' & race == `r'
                }
            }
            As far as I can see, this offers no advantages over #2 and it has several drawbacks:

            1. The code is longer; more opportunities to make a mistake.
            2. The execution time will be drastically longer if your data set is large.
            3. It will only work as written if county and race are numeric variables. If either is a string variable, you have to add quotes in a bunch of places--another opportunity to make mistakes. By contrast the code in #2 is indifferent to whether these variables are string or numeric.

            Why do you want to avoid writing a program? Writing a program is both shorter and simpler than doing it this way with multiple loops.

            Comment

            Working...
            X