Hello All,
I am new to posting on statalist. I hope my question is clear and happy to clarify if it is confusing.
I am working with county data and trying to create a year identifier var for each year of observations collected. About the dataset: The dataset compiles information that enable aggregation of county-level data over multiple years for health topics.
I want to use this dataset for studying policy effect (the effect of a health clinic on community outcomes) using diff-in-diff. I want to create the time variable (the pre_post variable) however there is no separate time variable. I am not sure if it is possible to have a time variable with aggregate data. Ideally I would want the dataset to include a time variable for each year of observations in the dataset.
Right now, the dataset labels the year of the observations by the name of the variable. For example
the number of doctors in LA County, California from 2010 - 2012 is three separate vars for the information:
count of doctors in 2010 = md2010,
count of doctors in 2011 = md2011,
and so forth.
For my independent var of interest: number of health clinics (fqhc) in each county is labeled with the year at the end of the variable:
fqhc10 = health clinics in 2010,
fqhc11 = health clinics in 2011,
fqhc12 = health clinics in 2012,
and so forth
My data looks like this
As you can see in my data, I have the county and state as string variables and the count of clinics by county is provided by individual variables for each year. I imagine that in order for me to do a diff-in-diff with pre-post analysis, I would need to create a time variable for each year of data I have for health clinics. I have many questions about whether this is the correct dataset (count-level/aggregate data) to do this analysis.
I did not include any additional variables from the dataset, but for all other health variables, they are all coded with the year indicated at the end of the variable.
Thank you so much
Stata 12 on MAC OS (but have access to Stata 15 on Windows)
I am new to posting on statalist. I hope my question is clear and happy to clarify if it is confusing.
I am working with county data and trying to create a year identifier var for each year of observations collected. About the dataset: The dataset compiles information that enable aggregation of county-level data over multiple years for health topics.
I want to use this dataset for studying policy effect (the effect of a health clinic on community outcomes) using diff-in-diff. I want to create the time variable (the pre_post variable) however there is no separate time variable. I am not sure if it is possible to have a time variable with aggregate data. Ideally I would want the dataset to include a time variable for each year of observations in the dataset.
Right now, the dataset labels the year of the observations by the name of the variable. For example
the number of doctors in LA County, California from 2010 - 2012 is three separate vars for the information:
count of doctors in 2010 = md2010,
count of doctors in 2011 = md2011,
and so forth.
For my independent var of interest: number of health clinics (fqhc) in each county is labeled with the year at the end of the variable:
fqhc10 = health clinics in 2010,
fqhc11 = health clinics in 2011,
fqhc12 = health clinics in 2012,
and so forth
My data looks like this
Code:
* Example generated by -dataex-. To install: ssc install dataex
* dataex statename countyname fqhc15 fqhc14 fqhc13 fqhc12 fqhc11 fqhc10
clear
input str20 statename str25 countyname int(fqhc15 fqhc14 fqhc13 fqhc12 fqhc11 fqhc10)
"South Carolina" "Abbeville" 1 1 1 1 1 1
"Louisiana" "Acadia" 2 2 2 2 2 2
"Virginia" "Accomack" 3 3 3 3 3 4
"Idaho" "Ada" 10 10 6 6 4 4
"Kentucky" "Adair" 2 2 2 2 2 2
"Missouri" "Adair" 3 3 3 3 3 4
"Iowa" "Adair" 0 0 0 0 0 0
"Oklahoma" "Adair" 3 3 3 3 2 2
"Pennsylvania" "Adams" 1 1 1 1 1 1
"Wisconsin" "Adams" 0 0 0 0 0 0
"Illinois" "Adams" 3 1 1 1 1 1
"Mississippi" "Adams" 2 2 2 2 2 1
"Ohio" "Adams" 3 3 2 2 2 2
"Iowa" "Adams" 0 0 0 0 0 0
"Colorado" "Adams" 11 10 10 7 6 6
"North Dakota" "Adams" 0 0 0 0 0 0
"Washington" "Adams" 3 3 3 3 3 3
"Nebraska" "Adams" 0 0 0 0 0 0
"Idaho" "Adams" 1 1 1 1 1 1
"Indiana" "Adams" 0 0 0 0 0 0
"Vermont" "Addison" 2 2 2 1 0 0
"Puerto Rico" "Adjuntas" 0 0 0 0 0 0
"Puerto Rico" "Aguada" 0 0 0 0 0 0
"Puerto Rico" "Aguadilla" 0 0 0 0 0 0
"Puerto Rico" "Aguas Buenas" 1 0 0 0 0 0
"Puerto Rico" "Aibonito" 0 0 0 0 0 0
"South Carolina" "Aiken" 4 4 5 5 1 1
"Minnesota" "Aitkin" 1 1 1 2 2 2
"Florida" "Alachua" 7 6 4 3 3 2
"North Carolina" "Alamance" 4 4 4 4 3 2
"California" "Alameda" 36 33 32 32 31 28
"Colorado" "Alamosa" 4 4 4 4 4 4
"New York" "Albany" 1 1 1 1 1 1
"Wyoming" "Albany" 0 0 0 0 0 0
"Virginia" "Albemarle" 1 1 1 1 1 1
"Michigan" "Alcona" 4 4 4 4 3 3
"Mississippi" "Alcorn" 1 1 1 1 1 1
"Alaska" "Aleutians East (B)" 0 0 0 0 0 0
"Alaska" "Aleutians West (CA)" 1 1 1 1 1 1
"North Carolina" "Alexander" 0 0 0 0 0 0
"Illinois" "Alexander" 3 3 3 2 2 2
"Virginia" "Alexandria City" 2 3 3 3 3 2
"Oklahoma" "Alfalfa" 1 1 1 1 1 1
"Michigan" "Alger" 1 1 1 1 1 1
"Iowa" "Allamakee" 0 0 0 0 0 0
"Michigan" "Allegan" 1 1 1 1 1 1
"New York" "Allegany" 2 2 2 2 2 2
"Maryland" "Allegany" 2 2 2 2 2 2
"North Carolina" "Alleghany" 0 0 0 0 0 0
"Virginia" "Alleghany" 0 0 0 0 0 0
"Pennsylvania" "Allegheny" 22 24 24 24 24 24
"Kansas" "Allen" 1 1 1 0 0 0
"Kentucky" "Allen" 0 0 0 0 0 0
"Ohio" "Allen" 3 2 1 1 1 1
"Louisiana" "Allen" 2 2 1 1 1 1
"Indiana" "Allen" 3 3 3 2 2 0
"South Carolina" "Allendale" 1 1 1 1 1 1
"Michigan" "Alpena" 3 3 3 3 3 3
"California" "Alpine" 0 0 0 0 0 0
"California" "Amador" 0 0 0 0 0 0
"Virginia" "Amelia" 2 3 3 3 2 2
"Virginia" "Amherst" 1 1 1 1 0 0
"Mississippi" "Amite" 1 1 1 1 1 1
"Puerto Rico" "Anasco" 0 0 0 0 0 0
"Alaska" "Anchorage (B)" 2 2 2 2 3 3
"Texas" "Anderson" 2 2 2 3 3 3
"Tennessee" "Anderson" 1 1 0 0 0 0
"Kansas" "Anderson" 0 0 0 0 0 0
"South Carolina" "Anderson" 0 0 0 0 0 0
"Kentucky" "Anderson" 0 0 0 0 0 0
"Missouri" "Andrew" 1 1 1 1 2 1
"Texas" "Andrews" 0 0 0 0 0 0
"Maine" "Androscoggin" 12 13 14 13 13 3
"Texas" "Angelina" 0 0 0 0 0 0
"Maryland" "Anne Arundel" 3 2 4 4 5 4
"Minnesota" "Anoka" 0 0 0 0 0 0
"North Carolina" "Anson" 1 1 1 2 2 2
"Nebraska" "Antelope" 0 0 0 0 0 0
"Michigan" "Antrim" 2 2 2 2 2 2
"Arizona" "Apache" 3 3 2 2 2 2
"Iowa" "Appanoose" 2 2 2 2 2 2
"Georgia" "Appling" 1 1 0 0 0 0
"Virginia" "Appomattox" 0 0 0 0 0 0
"Texas" "Aransas" 0 0 0 0 0 0
"Colorado" "Arapahoe" 8 8 8 5 5 5
"Texas" "Archer" 0 0 0 0 0 0
"Colorado" "Archuleta" 0 0 0 0 0 0
"Puerto Rico" "Arecibo" 0 0 0 0 0 0
"Michigan" "Arenac" 1 1 1 1 1 1
"Arkansas" "Arkansas" 0 0 0 0 0 0
"Virginia" "Arlington" 2 2 1 1 1 1
"Pennsylvania" "Armstrong" 0 0 0 0 0 0
"Texas" "Armstrong" 0 0 0 0 0 0
"Maine" "Aroostook" 17 17 16 15 15 15
"Puerto Rico" "Arroyo" 1 1 1 1 1 1
"Nebraska" "Arthur" 0 0 0 0 0 0
"Louisiana" "Ascension" 1 1 1 0 0 0
"North Carolina" "Ashe" 0 0 0 0 0 0
"Ohio" "Ashland" 0 0 0 0 0 0
"Wisconsin" "Ashland" 1 1 1 1 1 1
end
I did not include any additional variables from the dataset, but for all other health variables, they are all coded with the year indicated at the end of the variable.
Thank you so much
Stata 12 on MAC OS (but have access to Stata 15 on Windows)
Comment