I have an excel file with several worksheets in it. Each worksheet is for a particular state of India, and then its other details are given like Zone, district, sub-district (called tehsil in this data). My aim is to read all of this data into Stata.
Step 1. So first I create a loop that imports each worksheet one by one in Stata and saves it as Dta file.
Step 2. Then I create another loop that appends all these Dta files.
There is no issue in either reading the excel files or appending them.
The issue is that in Step 1 when Stata is saving each worksheet as a separate data file, it reads the data incorrectly. For example, in the original excel file the data looks like the following:
But when Stata saves it, it looks like :
state zone districtname tehsilcode tehsilname
Tamil Nadu 1 KANCHEEPURAM 3 CHENGALPATTU
Tamil Nadu 1 KANCHEEPURAM 4 CHEYYUR
Tamil Nadu 1 KANCHEEPURAM 2 KANCHEEPURAM
Tamil Nadu 1 TIRUVALLUR 1 PONNERI
Tamil Nadu 2 CUDDALORE 17 KATTUMANNARKOIL
Tamil Nadu 2 CUDDALORE 16 TITTAKUDI
Tamil Nadu 2 CUDDALORE 12 CUDDALORE
Tamil Nadu 2 CUDDALORE 13 PANRUTI
Tamil Nadu 2 CUDDALORE 15 CHITHAMBARAM -I
Tamil Nadu 2 CUDDALORE 14 CHIDAMBARAM - I
As you can see, it changes the entries in tehsilcode and tehsilname for districtsname.
This is happening for other worksheets (states) as well.
How can I ensure that Stata is reading the data correctly?
Thanks,
Shweta
Step 1. So first I create a loop that imports each worksheet one by one in Stata and saves it as Dta file.
Step 2. Then I create another loop that appends all these Dta files.
There is no issue in either reading the excel files or appending them.
The issue is that in Step 1 when Stata is saving each worksheet as a separate data file, it reads the data incorrectly. For example, in the original excel file the data looks like the following:
state | zone | districtname | tehsilcode | tehsilname |
Tamil Nadu | 1 | KANCHEEPURAM | 1 | PONNERI |
Tamil Nadu | 1 | KANCHEEPURAM | 2 | KANCHEEPURAM |
Tamil Nadu | 1 | KANCHEEPURAM | 3 | CHENGALPATTU |
Tamil Nadu | 1 | TIRUVALLUR | 4 | CHEYYUR |
Tamil Nadu | 2 | CUDDALORE | 5 | VELLORE |
Tamil Nadu | 2 | CUDDALORE | 6 | POLUR |
Tamil Nadu | 2 | CUDDALORE | 7 | THIRUVANNAMALAI |
Tamil Nadu | 2 | CUDDALORE | 8 | TINDIVANAM - I |
Tamil Nadu | 2 | CUDDALORE | 9 | TINDIVANAM II |
Tamil Nadu | 2 | CUDDALORE | 10 | TIRUKOILUR |
state zone districtname tehsilcode tehsilname
Tamil Nadu 1 KANCHEEPURAM 3 CHENGALPATTU
Tamil Nadu 1 KANCHEEPURAM 4 CHEYYUR
Tamil Nadu 1 KANCHEEPURAM 2 KANCHEEPURAM
Tamil Nadu 1 TIRUVALLUR 1 PONNERI
Tamil Nadu 2 CUDDALORE 17 KATTUMANNARKOIL
Tamil Nadu 2 CUDDALORE 16 TITTAKUDI
Tamil Nadu 2 CUDDALORE 12 CUDDALORE
Tamil Nadu 2 CUDDALORE 13 PANRUTI
Tamil Nadu 2 CUDDALORE 15 CHITHAMBARAM -I
Tamil Nadu 2 CUDDALORE 14 CHIDAMBARAM - I
As you can see, it changes the entries in tehsilcode and tehsilname for districtsname.
This is happening for other worksheets (states) as well.
How can I ensure that Stata is reading the data correctly?
Thanks,
Shweta
Comment