My program is for working with published tables from American Community Survey, accessible via American Fact Finder. One of the first steps of the program is to append multiple years (each from its own file) from a single table ID. The files may be, for example:
(The program creates the .dta files from downloaded .csv files.)
I would like to combine different tables such as the following. The file names will be different at various times the program is run.
(Note the different table numbers of B06009 and B07009.)
The following code works but creates extra rows in the data with corresponding missing values:
I have altered the variable names before this step to avoid having the same variable names in B06009 data and B07009 data, making the use of append possible. The identifying variable, geoid2, has not been changed and could be used as a merge variable.
The question is, how can I put the filenames into groups to first be appended to each other? The appended files would then be merged. In the above set of files, I would append all of the B06009's together, likewise the B07009's, then merge the two resultant files.
Luckily, the filenames have the same format, though the length of the table name may change. I already have the existing code to get at the table name:
Then, what I would like to do is use the resulting `table_name' to group the files. I'm imagining some sort of matrix with each file name and a group number. Or perhaps I could store the `table_name' in a data characteristic or global macro. Note that I am using version 11.2.
Thanks
Code:
acs_10_1yr_b06009_with_ann.csv.dta acs_11_1yr_b06009_with_ann.csv.dta acs_12_1yr_b06009_with_ann.csv.dta
I would like to combine different tables such as the following. The file names will be different at various times the program is run.
Code:
acs_10_1yr_b06009_with_ann.csv.dta acs_11_1yr_b06009_with_ann.csv.dta acs_12_1yr_b06009_with_ann.csv.dta acs_10_1yr_b07009_with_ann.csv.dta acs_11_1yr_b07009_with_ann.csv.dta acs_12_1yr_b07009_with_ann.csv.dta
The following code works but creates extra rows in the data with corresponding missing values:
Code:
local datafiles : dir . files "*.dta" append using `datafiles', force
The question is, how can I put the filenames into groups to first be appended to each other? The appended files would then be merged. In the above set of files, I would append all of the B06009's together, likewise the B07009's, then merge the two resultant files.
Luckily, the filenames have the same format, though the length of the table name may change. I already have the existing code to get at the table name:
Code:
local datafiles : dir . files "*.csv.dta" foreach file in `datafiles' {local filename_truncated = subinstr(`"`file'"', `"_with_ann.csv"', "", .) local table_name = substr("`filename_truncated'", 12, .)}
Thanks
Comment