Hi Everyone,
Thank you very much for your help from all the statalist members. I am having diffculty making a stata dataset from excel file which was poorly prepared by data collectors. Significant effort has gone into the data colection and cannot be repeated due to financial and time constraints. I was hoping to see if any of the stata gurus can help me.
I am uploading the problem_dataset(deidentified and truncated) and the desired dataset formats for clarification pupose. The problem_dataset contains variables DXCCS1 DXCCS2 ..DXCCS9 and their counts as COUNT1...COUNT9
The goal is to create one column DXCCS, which would include all the values of each variable DXCCS1...DXCCS9. Some values of one variable may not be present in other .
EG-1:DXCCS9 doesnt have value 31 but present in DXCCS1-8
Eg-2:DXCCS1 and DXCCS2 has value 56 but not in DXCCS3-9.
My code so far that has been working for me is as follows, however I cannot run it in a loop as the cell ranges differ from sheet to sheet and I get a error. There are nearly 200 sheets and it would take really long time to run it manually for each excel workbook.
Can any of the members please help me. I thank you very much for your time and effort. Thanks again.
Sincerely
Anwar
Thank you very much for your help from all the statalist members. I am having diffculty making a stata dataset from excel file which was poorly prepared by data collectors. Significant effort has gone into the data colection and cannot be repeated due to financial and time constraints. I was hoping to see if any of the stata gurus can help me.
I am uploading the problem_dataset(deidentified and truncated) and the desired dataset formats for clarification pupose. The problem_dataset contains variables DXCCS1 DXCCS2 ..DXCCS9 and their counts as COUNT1...COUNT9
The goal is to create one column DXCCS, which would include all the values of each variable DXCCS1...DXCCS9. Some values of one variable may not be present in other .
EG-1:DXCCS9 doesnt have value 31 but present in DXCCS1-8
Eg-2:DXCCS1 and DXCCS2 has value 56 but not in DXCCS3-9.
My code so far that has been working for me is as follows, however I cannot run it in a loop as the cell ranges differ from sheet to sheet and I get a error. There are nearly 200 sheets and it would take really long time to run it manually for each excel workbook.
Can any of the members please help me. I thank you very much for your time and effort. Thanks again.
Sincerely
Anwar
Code:
import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(A1:B247) firstrow allstring clear drop if DXCCS2=="" rename DXCCS2 DXCCS3 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(C1:D247) firstrow allstring clear drop if DXCCS3=="" merge 1:1 DXCCS3 using "`temp'" drop _merge rename DXCCS3 DXCCS4 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(E1:F247) firstrow allstring clear drop if DXCCS4=="" merge 1:1 DXCCS4 using "`temp'" drop _merge rename DXCCS4 DXCCS5 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(G1:H247) firstrow allstring clear drop if DXCCS5=="" merge 1:1 DXCCS5 using "`temp'" drop _merge rename DXCCS5 DXCCS6 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(I1:J247) firstrow allstring clear drop if DXCCS6=="" merge 1:1 DXCCS6 using "`temp'" drop _merge rename DXCCS6 DXCCS7 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(K1:L247) firstrow allstring clear drop if DXCCS7=="" merge 1:1 DXCCS7 using "`temp'" drop _merge rename DXCCS7 DXCCS8 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(M1:N247) firstrow allstring clear drop if DXCCS8=="" merge 1:1 DXCCS8 using "`temp'" drop _merge rename DXCCS8 DXCCS9 tempfile temp save "`temp'" import excel "C:\Users\Anwar\Dropbox\Temporary\problemdataset.xlsx", sheet("1") cellrange(O1:P247) firstrow allstring clear drop if DXCCS9=="" merge 1:1 DXCCS9 using "`temp'" drop _merge rename DXCCS9 DXCCS2
Comment