Resampling, storing probit estimates and applying in another subsample in a dataset

Farogat WIUT

Join Date: Feb 2018

Posts: 37
#1

Resampling, storing probit estimates and applying in another subsample in a dataset

29 Jan 2022, 18:09

Hello dear Stata users,

I have a little complicated task for my thesis and I am struggling to come up with the logically correct coding in stata.
Namely, I need the following sequence (I have a household dataset containing 2 countries - base and comparison):
1. Generate a sample of size N from base country
2. Run probit regression and save coefficients
3. Generate another sample of size N as well from comparison country
4. Apply stored probit coefficients for comparison country and predict y_hat (in my case homeownership rate)
5. Actual homeownership rate in base country (from the dataset) - y_hat from (4)

And this process should be repeated around 100-200 times in order to find the 95% confidence interval of differences. I am sure there should be looping involved, but I have no idea how to complete step 4. The loop I hope to figure out myself and I think it won't cause a big trouble.

Thanks,
Farogat.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

30 Jan 2022, 02:37

I am sure there should be looping involved, but I have no idea how to complete step 4.

If both samples are within the same dataset, use the -if- qualifier

Code:

probit ... if base STEP 3 predict yhat if comparison

where base and comparison are indicators. If these are in different datasets

Code:

use comparison, clear estimates restore base predict yhat

For this to work, the variables in the comparison dataset must be named exactly as in the base dataset, and they should all exist.
1 like
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#3

30 Jan 2022, 04:48

Thanks Andrew!
Yes, the datasets are separate. Can you clarify "estimates restore base"?

Code:

use comparison, clear estimates restore base predict yhat

This is what I did (Uzb is base, Kaz is comparison country. Kaz.dta is a separate dataset from which comparison comes):

bsample 1477 if Uzb
probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital

quietly des using Kaz.dta
local N =971

clear
set obs 1477
set seed 13

gen obs_id = runiformint(1, 1477)
merge m:1 obs_id using Kaz.dta, keep(match) nogenerate

estimates restore Uzb
predict yhat

But I get this error:

. estimates restore Uzb
estimation result Uzb not found
r(111);

end of do-file

r(111);
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#4

30 Jan 2022, 05:05

You have to store estimates from the probit model.

Code:

probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital estimates store Uzb
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#5

30 Jan 2022, 06:38

Thank you, it worked well!

May I ask another thing? I have a fixed number, let's say the ownership rate from macrodata. I want to subtract yhat from this fixed rate. As I told earlier, I need to do this 200 times, it means that I have to store the difference every time and in the end estimate 95% confidence interval for the 200 differences.

within the loop, after the probit estimates, how do I store the difference, so that I have 200 differences stored and not lost? Now I have this, and so far it's not in the loop. I will apply it in the loop once I figure out the difference issue (for one set):

}*/

bsample 1477 if Uzb
probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
estimates store Uzba

quietly des using Kaz.dta
local N =971

clear
set obs 1477
*set seed 13

gen obs_id = runiformint(1, 1477)
merge m:1 obs_id using Kaz.dta, keep(match) nogenerate

estimates restore Uzba
predict yhat
di yhat
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

30 Jan 2022, 07:31

Is the rate a constant value? When you predict the outcome, you have a prediction for each observation. Do you wish to subtract the prediction from this rate for each observation or the average of the predictions across the observations?

Code:

webuse lbw, clear
probit low age
predict lowhat
*prediction for each observation
l lowhat in 1/10, sep(0)
*averged prediction
sum lowhat

Res.:

Code:

.
.*prediction for each observation

. l lowhat in 1/10, sep(0)

     +----------+
     |   lowhat |
     |----------|
  1. | .3581072 |
  2. | .2103524 |
  3. | .3463953 |
  4. | .3348283 |
  5. | .3699543 |
  6. | .3348283 |
  7. |  .323416 |
  8. | .3819261 |
  9. | .2485671 |
 10. | .2794887 |
     +----------+

.
. *averged prediction

.
. sum lowhat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      lowhat |        189    .3120393    .0567287   .1182503    .418481

.

Last edited by Andrew Musau; 30 Jan 2022, 07:34.

Comment

Farogat WIUT

Join Date: Feb 2018

Posts: 37
#7

30 Jan 2022, 08:58

Yes, the rate is constant. I want to subtract averaged yhats from this constant. So, if I repeat that 200 times, I'll have 200 differences.
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#8

30 Jan 2022, 09:09

I mean, I need: rate - average(yhat across observations)
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

30 Jan 2022, 09:37

Make sure that the rate is a proportion so that the subtraction makes sense. Below, I set it at 0.3 - replace it with the relevant value.

Code:

bsample 1477 if Uzb
probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
estimates store Uzba

quietly des using Kaz.dta
local N =971

frame create results
frame results{
    set obs 200
    gen results=.
}
local rate 0.3
forval i=1/200{
    clear
    set obs 1477
    gen obs_id = runiformint(1, 1477)
    merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
    estimates restore Uzba
    predict yhat
    qui sum yhat
    frame results: replace results= `rate'- r(mean) in `i'
}
frame results: list in 1/200, sep(0)

Comment

Farogat WIUT

Join Date: Feb 2018

Posts: 37
#10

30 Jan 2022, 11:24

Thank you a lot, Andrew!

Sorry for too many questions, this is a bit different procedure for my thesis.

I wonder why frame is not recognized by Stata. I searched for ssc install, help commands, but unlike other commands, I could not find the proper package for this command.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

30 Jan 2022, 11:38

On a subsequent topic you started, I suggested that you review the Statalist FAQ and there you will find that unless we are told otherwise, we assume you are running the most recent release of Stata, which currently is 17.0.

Apparently you are running a version much older, since the frame command was introduced in version 16.
1 like
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#12

30 Jan 2022, 12:35

Yes, I'm using version 14
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

#13

30 Jan 2022, 12:50

You can use a matrix:

Code:

bsample 1477 if Uzb
probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
estimates store Uzba

quietly des using Kaz.dta
local N =971

clear matrix
}
local rate 0.3
forval i=1/200{
    clear
    set obs 1477
    gen obs_id = runiformint(1, 1477)
    merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
    estimates restore Uzba
    predict yhat
    qui sum yhat
    mat res= `rate'- r(mean)
    mat result= nullmat(result)\res
}
clear
svmat result
l in 1/200, sep(0)

Comment

Farogat WIUT

Join Date: Feb 2018

Posts: 37
#14

30 Jan 2022, 13:14

Thank you tons!
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#15

01 Feb 2022, 16:31

Hi Andrew, I was able to access the virtual desktop from uni with Stata17.

May I ask what is the function of "in `i' " in this loop?:

forval i=1/200{
clear
set obs 1477
gen obs_id = runiformint(1, 1477)
merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
estimates restore Uzb`i'
predict yhat
qui sum yhat
frame results: replace results= `rate'- r(mean) in `i'
}

I did this:
local compar_rate 0.898

forval i=1/200{
clear
set obs 1477
gen obs_id = runiformint(1, 1477)
merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
estimates restore Uzb`i'
predict yhat
qui sum yhat
frame results1: replace results1= r(mean) in `i' - `compar_rate' }

But I got an invalid syntax error message. PS: I created the frame results1 as before.

Last edited by Farogat WIUT; 01 Feb 2022, 16:34.
Comment

Announcement