imputing values for a variable

Zariab Hossain

Join Date: Oct 2020

Posts: 48
#1

imputing values for a variable

07 Mar 2024, 13:06

Dear Stata users

I am working with a dataset like below:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(referal_CWid referal_officeid) float referal_month double(nos_off nos_js) 4595 1209 569 1 1 4595 1209 570 1 2 4595 1209 570 1 2 4595 1209 576 1 1 4595 1209 577 1 1 4595 1209 578 1 1 4595 1202 590 1 1 4595 1202 594 1 2 4595 1202 594 1 2 4595 1227 598 1 1 4595 1227 599 1 1 4595 1227 604 1 1 4595 1227 605 1 1 4595 1227 606 1 1 4595 1227 611 1 1 4595 1227 613 1 1 4595 1227 618 2 1 4595 1229 618 2 1 4595 1228 619 3 1 4595 1229 619 3 1 4595 1227 619 3 1 4595 1227 621 1 1 4595 1227 623 1 1 4595 1227 624 1 1 4595 1227 625 1 1 4595 1227 626 1 1 4595 1227 642 1 1 4595 1227 650 1 1 end format %tm referal_month

referal_CWid is person id; referal_officeid is office id; referal_month is a time (monthly) variable; nos_off denotes the number of branches of an office a person worked in a specific month and finally nos_js is the number of client of the person at a specific office in a specific month. As you can probably detect, in some months the person works at multiple office branches. I want to remove duplicates month to have unique id time panel. In order to remove duplicate months, (i) I want to keep the office with highest number of clients. (ii) But there are cases where number of clients (nos_js) is a tie. In this particular case, I want to keep the office of the previous month when nos_off==1.

I tried the following:

bys referal_CWid referal_month (nos_js): egen major_off=max(nos_js)
keep if nos_js==major_off

It only works if I do not have case (ii). But it has flaws in the presence of case(ii).

Any suggestions on this would be highly appreciated.

PS: The above dataset is at the Client level and hence there are some duplicate rows.

Zariab Hossain
Uppsala University

Last edited by Zariab Hossain; 07 Mar 2024, 13:39.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 29795

07 Mar 2024, 14:03

I'm not sure I understand what you want to do, but it might be this:

Code:

preserve
keep if nos_off == 1
keep referal_CWid referal_officeid referal_month
tempfile single_offices
save `single_offices'
restore

assert !missing(nos_js)
by referal_CWid referal_month (nos_js), sort: keep if nos_js == nos_js[_N]
duplicates tag referal_CWid referal_month, gen(flag)
preserve
keep if !flag
drop flag
tempfile results
save `results'
restore
keep if flag
drop flag
rangejoin referal_month . -1 using `single_offices', by(referal_CWid)
by referal_CWid referal_month (referal_month_U), sort: keep if _n == _N
replace referal_officeid = referal_officeid_U
drop *_U

append using `results'
isid referal_CWid referal_month, sort

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

Note: this code will fail if for some referal_CWid in some referal_month there is no earlier month in which nos_off == 1. Of course, in that case, there is also no solution that meets your criteria.

Comment

Zariab Hossain

Join Date: Oct 2020

Posts: 48
#3

27 Mar 2024, 06:43

Many thanks for your kind help.
Comment

Announcement

imputing values for a variable

Comment

Comment