Dear Statalist Members,
I am working with the following dataset:
quietly{
clear
input str9 gvkey year str10 fqenddt bhr a
31 2005 "2002/3/31" 0.3565015 0.02116004
31 2005 "2002/6/30" 0.4100328 0.02116004
31 2005 "2002/9/30" -0.2517873 0.02116004
31 2005 "2002/12/31" -0.173169 0.02116004
31 2005 "2003/3/31" 0.0910931 0.02116004
31 2005 "2003/6/30" 0.011535 0.02116004
31 2005 "2003/9/30" -0.0306233 0.02116004
31 2005 "2003/12/31" -0.1218563 0.02116004
608 2005 "2002/6/30" -0.0563066 0.034307506
608 2005 "2002/9/30" -0.1087845 0.034307506
608 2005 "2002/12/31" -0.1365316 0.034307506
608 2005 "2003/3/31" 0.0479617 0.034307506
608 2005 "2003/6/30" -0.082426 0.034307506
608 2005 "2003/9/30" -0.1919279 0.034307506
608 2005 "2003/12/31" -0.0780903 0.034307506
end
gen fqenddt_date = date(fqenddt, "YMD")
format %tdCCYY-NN-DD fqenddt_date
drop fqenddt
rename(fqenddt_date )(fqenddt)
save "test.dta", replace
}
Using the following code, I expand and match the data based on fqenddt:
use "test.dta", clear
tempfile raw
save `raw'
rename (*) (*2)
* keep the merging vars same
rename (fqenddt2)(fqenddt)
joinby fqenddt using `raw'
drop if gvkey==gvkey2
sort gvkey gvkey2 fqenddt
drop year2
order gvkey year fqenddt bhr a gvkey2 bhr2 a2
This produces the desired output, but the result excludes some rows from the original dataset where there are no matches:
gvkey year fqenddt bhr a gvkey2 bhr2 a2
31 2005 2002/6/30 0.4100328 0.02116 608 -0.0563066 0.0343075
31 2005 2002/9/30 -0.2517873 0.02116 608 -0.1087845 0.0343075
31 2005 2002/12/31 -0.173169 0.02116 608 -0.1365316 0.0343075
31 2005 2003/3/31 0.0910931 0.02116 608 0.0479617 0.0343075
31 2005 2003/6/30 0.011535 0.02116 608 -0.082426 0.0343075
31 2005 2003/9/30 -0.0306233 0.02116 608 -0.1919279 0.0343075
31 2005 2003/12/31 -0.1218563 0.02116 608 -0.0780903 0.0343075
608 2005 2002/6/30 -0.0563066 0.0343075 31 0.4100328 0.02116
608 2005 2002/9/30 -0.1087845 0.0343075 31 -0.2517873 0.02116
608 2005 2002/12/31 -0.1365316 0.0343075 31 -0.173169 0.02116
608 2005 2003/3/31 0.0479617 0.0343075 31 0.0910931 0.02116
608 2005 2003/6/30 -0.082426 0.0343075 31 0.011535 0.02116
608 2005 2003/9/30 -0.1919279 0.0343075 31 -0.0306233 0.02116
608 2005 2003/12/31 -0.0780903 0.0343075 31 -0.1218563 0.02116
The following are the desired output I aim to achieve:
gvkey year fqenddt bhr a gvkey2 bhr2 a2
31 2005 2002/3/31 0.3565015 0.02116004 608 0.0343075
31 2005 2002/6/30 0.4100328 0.02116 608 -0.0563066 0.0343075
31 2005 2002/9/30 -0.2517873 0.02116 608 -0.1087845 0.0343075
31 2005 2002/12/31 -0.173169 0.02116 608 -0.1365316 0.0343075
31 2005 2003/3/31 0.0910931 0.02116 608 0.0479617 0.0343075
31 2005 2003/6/30 0.011535 0.02116 608 -0.082426 0.0343075
31 2005 2003/9/30 -0.0306233 0.02116 608 -0.1919279 0.0343075
31 2005 2003/12/31 -0.1218563 0.02116 608 -0.0780903 0.0343075
608 2005 2002/6/30 -0.0563066 0.0343075 31 0.4100328 0.02116
608 2005 2002/9/30 -0.1087845 0.0343075 31 -0.2517873 0.02116
608 2005 2002/12/31 -0.1365316 0.0343075 31 -0.173169 0.02116
608 2005 2003/3/31 0.0479617 0.0343075 31 0.0910931 0.02116
608 2005 2003/6/30 -0.082426 0.0343075 31 0.011535 0.02116
608 2005 2003/9/30 -0.1919279 0.0343075 31 -0.0306233 0.02116
608 2005 2003/12/31 -0.0780903 0.0343075 31 -0.1218563 0.02116
Question:
Is there a way to modify the approach so that all rows from the original dataset are retained, even if they don’t find matches during the expansion process? I’d like to achieve this without altering the original dataset structure.
Thank you in advance for your help!
I am working with the following dataset:
quietly{
clear
input str9 gvkey year str10 fqenddt bhr a
31 2005 "2002/3/31" 0.3565015 0.02116004
31 2005 "2002/6/30" 0.4100328 0.02116004
31 2005 "2002/9/30" -0.2517873 0.02116004
31 2005 "2002/12/31" -0.173169 0.02116004
31 2005 "2003/3/31" 0.0910931 0.02116004
31 2005 "2003/6/30" 0.011535 0.02116004
31 2005 "2003/9/30" -0.0306233 0.02116004
31 2005 "2003/12/31" -0.1218563 0.02116004
608 2005 "2002/6/30" -0.0563066 0.034307506
608 2005 "2002/9/30" -0.1087845 0.034307506
608 2005 "2002/12/31" -0.1365316 0.034307506
608 2005 "2003/3/31" 0.0479617 0.034307506
608 2005 "2003/6/30" -0.082426 0.034307506
608 2005 "2003/9/30" -0.1919279 0.034307506
608 2005 "2003/12/31" -0.0780903 0.034307506
end
gen fqenddt_date = date(fqenddt, "YMD")
format %tdCCYY-NN-DD fqenddt_date
drop fqenddt
rename(fqenddt_date )(fqenddt)
save "test.dta", replace
}
Using the following code, I expand and match the data based on fqenddt:
use "test.dta", clear
tempfile raw
save `raw'
rename (*) (*2)
* keep the merging vars same
rename (fqenddt2)(fqenddt)
joinby fqenddt using `raw'
drop if gvkey==gvkey2
sort gvkey gvkey2 fqenddt
drop year2
order gvkey year fqenddt bhr a gvkey2 bhr2 a2
This produces the desired output, but the result excludes some rows from the original dataset where there are no matches:
gvkey year fqenddt bhr a gvkey2 bhr2 a2
31 2005 2002/6/30 0.4100328 0.02116 608 -0.0563066 0.0343075
31 2005 2002/9/30 -0.2517873 0.02116 608 -0.1087845 0.0343075
31 2005 2002/12/31 -0.173169 0.02116 608 -0.1365316 0.0343075
31 2005 2003/3/31 0.0910931 0.02116 608 0.0479617 0.0343075
31 2005 2003/6/30 0.011535 0.02116 608 -0.082426 0.0343075
31 2005 2003/9/30 -0.0306233 0.02116 608 -0.1919279 0.0343075
31 2005 2003/12/31 -0.1218563 0.02116 608 -0.0780903 0.0343075
608 2005 2002/6/30 -0.0563066 0.0343075 31 0.4100328 0.02116
608 2005 2002/9/30 -0.1087845 0.0343075 31 -0.2517873 0.02116
608 2005 2002/12/31 -0.1365316 0.0343075 31 -0.173169 0.02116
608 2005 2003/3/31 0.0479617 0.0343075 31 0.0910931 0.02116
608 2005 2003/6/30 -0.082426 0.0343075 31 0.011535 0.02116
608 2005 2003/9/30 -0.1919279 0.0343075 31 -0.0306233 0.02116
608 2005 2003/12/31 -0.0780903 0.0343075 31 -0.1218563 0.02116
The following are the desired output I aim to achieve:
gvkey year fqenddt bhr a gvkey2 bhr2 a2
31 2005 2002/3/31 0.3565015 0.02116004 608 0.0343075
31 2005 2002/6/30 0.4100328 0.02116 608 -0.0563066 0.0343075
31 2005 2002/9/30 -0.2517873 0.02116 608 -0.1087845 0.0343075
31 2005 2002/12/31 -0.173169 0.02116 608 -0.1365316 0.0343075
31 2005 2003/3/31 0.0910931 0.02116 608 0.0479617 0.0343075
31 2005 2003/6/30 0.011535 0.02116 608 -0.082426 0.0343075
31 2005 2003/9/30 -0.0306233 0.02116 608 -0.1919279 0.0343075
31 2005 2003/12/31 -0.1218563 0.02116 608 -0.0780903 0.0343075
608 2005 2002/6/30 -0.0563066 0.0343075 31 0.4100328 0.02116
608 2005 2002/9/30 -0.1087845 0.0343075 31 -0.2517873 0.02116
608 2005 2002/12/31 -0.1365316 0.0343075 31 -0.173169 0.02116
608 2005 2003/3/31 0.0479617 0.0343075 31 0.0910931 0.02116
608 2005 2003/6/30 -0.082426 0.0343075 31 0.011535 0.02116
608 2005 2003/9/30 -0.1919279 0.0343075 31 -0.0306233 0.02116
608 2005 2003/12/31 -0.0780903 0.0343075 31 -0.1218563 0.02116
Question:
Is there a way to modify the approach so that all rows from the original dataset are retained, even if they don’t find matches during the expansion process? I’d like to achieve this without altering the original dataset structure.
Thank you in advance for your help!
Comment