Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Resampling, storing probit estimates and applying in another subsample in a dataset

    Hello dear Stata users,

    I have a little complicated task for my thesis and I am struggling to come up with the logically correct coding in stata.
    Namely, I need the following sequence (I have a household dataset containing 2 countries - base and comparison):
    1. Generate a sample of size N from base country
    2. Run probit regression and save coefficients
    3. Generate another sample of size N as well from comparison country
    4. Apply stored probit coefficients for comparison country and predict y_hat (in my case homeownership rate)
    5. Actual homeownership rate in base country (from the dataset) - y_hat from (4)

    And this process should be repeated around 100-200 times in order to find the 95% confidence interval of differences. I am sure there should be looping involved, but I have no idea how to complete step 4. The loop I hope to figure out myself and I think it won't cause a big trouble.

    Thanks,
    Farogat.

  • #2
    I am sure there should be looping involved, but I have no idea how to complete step 4.
    If both samples are within the same dataset, use the -if- qualifier

    Code:
    probit ... if base
    STEP 3
    predict yhat if comparison
    where base and comparison are indicators. If these are in different datasets

    Code:
    use comparison, clear
    estimates restore base
    predict yhat
    For this to work, the variables in the comparison dataset must be named exactly as in the base dataset, and they should all exist.

    Comment


    • #3
      Thanks Andrew!
      Yes, the datasets are separate. Can you clarify "estimates restore base"?


      Code:
      use comparison, clear
      estimates restore base
      predict yhat

      This is what I did (Uzb is base, Kaz is comparison country. Kaz.dta is a separate dataset from which comparison comes):

      bsample 1477 if Uzb
      probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital


      quietly des using Kaz.dta
      local N =971

      clear
      set obs 1477
      set seed 13

      gen obs_id = runiformint(1, 1477)
      merge m:1 obs_id using Kaz.dta, keep(match) nogenerate

      estimates restore Uzb
      predict yhat

      But I get this error:

      . estimates restore Uzb
      estimation result Uzb not found
      r(111);

      end of do-file

      r(111);


      Comment


      • #4
        You have to store estimates from the probit model.

        Code:
        probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
        estimates store Uzb

        Comment


        • #5
          Thank you, it worked well!

          May I ask another thing? I have a fixed number, let's say the ownership rate from macrodata. I want to subtract yhat from this fixed rate. As I told earlier, I need to do this 200 times, it means that I have to store the difference every time and in the end estimate 95% confidence interval for the 200 differences.

          within the loop, after the probit estimates, how do I store the difference, so that I have 200 differences stored and not lost? Now I have this, and so far it's not in the loop. I will apply it in the loop once I figure out the difference issue (for one set):

          }*/

          bsample 1477 if Uzb
          probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
          estimates store Uzba

          quietly des using Kaz.dta
          local N =971

          clear
          set obs 1477
          *set seed 13

          gen obs_id = runiformint(1, 1477)
          merge m:1 obs_id using Kaz.dta, keep(match) nogenerate

          estimates restore Uzba
          predict yhat
          di yhat

          Comment


          • #6
            Is the rate a constant value? When you predict the outcome, you have a prediction for each observation. Do you wish to subtract the prediction from this rate for each observation or the average of the predictions across the observations?

            Code:
            webuse lbw, clear
            probit low age
            predict lowhat
            *prediction for each observation
            l lowhat in 1/10, sep(0)
            *averged prediction
            sum lowhat
            Res.:

            Code:
            .
            .*prediction for each observation
            
            . l lowhat in 1/10, sep(0)
            
                 +----------+
                 |   lowhat |
                 |----------|
              1. | .3581072 |
              2. | .2103524 |
              3. | .3463953 |
              4. | .3348283 |
              5. | .3699543 |
              6. | .3348283 |
              7. |  .323416 |
              8. | .3819261 |
              9. | .2485671 |
             10. | .2794887 |
                 +----------+
            
            .
            . *averged prediction
            
            .
            . sum lowhat
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                  lowhat |        189    .3120393    .0567287   .1182503    .418481
            
            .
            Last edited by Andrew Musau; 30 Jan 2022, 08:34.

            Comment


            • #7
              Yes, the rate is constant. I want to subtract averaged yhats from this constant. So, if I repeat that 200 times, I'll have 200 differences.

              Comment


              • #8
                I mean, I need: rate - average(yhat across observations)

                Comment


                • #9
                  Make sure that the rate is a proportion so that the subtraction makes sense. Below, I set it at 0.3 - replace it with the relevant value.

                  Code:
                  bsample 1477 if Uzb
                  probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
                  estimates store Uzba
                  
                  quietly des using Kaz.dta
                  local N =971
                  
                  frame create results
                  frame results{
                      set obs 200
                      gen results=.
                  }
                  local rate 0.3
                  forval i=1/200{
                      clear
                      set obs 1477
                      gen obs_id = runiformint(1, 1477)
                      merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
                      estimates restore Uzba
                      predict yhat
                      qui sum yhat
                      frame results: replace results= `rate'- r(mean) in `i'
                  }
                  frame results: list in 1/200, sep(0)

                  Comment


                  • #10
                    Thank you a lot, Andrew!

                    Sorry for too many questions, this is a bit different procedure for my thesis.

                    I wonder why frame is not recognized by Stata. I searched for ssc install, help commands, but unlike other commands, I could not find the proper package for this command.

                    Comment


                    • #11
                      On a subsequent topic you started, I suggested that you review the Statalist FAQ and there you will find that unless we are told otherwise, we assume you are running the most recent release of Stata, which currently is 17.0.

                      Apparently you are running a version much older, since the frame command was introduced in version 16.

                      Comment


                      • #12
                        Yes, I'm using version 14

                        Comment


                        • #13
                          You can use a matrix:

                          Code:
                          bsample 1477 if Uzb
                          probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
                          estimates store Uzba
                          
                          quietly des using Kaz.dta
                          local N =971
                          
                          clear matrix
                          }
                          local rate 0.3
                          forval i=1/200{
                              clear
                              set obs 1477
                              gen obs_id = runiformint(1, 1477)
                              merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
                              estimates restore Uzba
                              predict yhat
                              qui sum yhat
                              mat res= `rate'- r(mean)
                              mat result= nullmat(result)\res
                          }
                          clear
                          svmat result
                          l in 1/200, sep(0)

                          Comment


                          • #14
                            Thank you tons!

                            Comment


                            • #15
                              Hi Andrew, I was able to access the virtual desktop from uni with Stata17.

                              May I ask what is the function of "in `i' " in this loop?:

                              forval i=1/200{
                              clear
                              set obs 1477
                              gen obs_id = runiformint(1, 1477)
                              merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
                              estimates restore Uzb`i'
                              predict yhat
                              qui sum yhat
                              frame results: replace results= `rate'- r(mean) in `i'
                              }

                              I did this:
                              local compar_rate 0.898

                              forval i=1/200{
                              clear
                              set obs 1477
                              gen obs_id = runiformint(1, 1477)
                              merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
                              estimates restore Uzb`i'
                              predict yhat
                              qui sum yhat
                              frame results1: replace results1= r(mean) in `i' - `compar_rate' }

                              But I got an invalid syntax error message. PS: I created the frame results1 as before.
                              Last edited by Farogat WIUT; 01 Feb 2022, 17:34.

                              Comment

                              Working...
                              X