Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with generating a binary variable with individual's state and year specifications

    Hi all.

    I am currently doing my dissertation based off the same methodology from Goldin and Katz (2002)'s Power of the Pill paper.

    I am trying to generate a binary variable P that takes on the value =1 if individual i's state of birth had a nonrestrictive birth control law for minors at the time i was 18 years old and =0 otherwise. I am using the IPUMS 1980 1% Census of Population sample for USA, which includes information on i's state of birth and year of birth. The state of birth has a unique numerical value/ID for each of them but they are not continuous (eg. the number 3 is not assigned to any state).

    I also have data for when each state's law changed: In 1969, only 9 states had the nonrestrictive birth control law, thus affecting cohorts born in 1951 (and presumably after?). In 1974, there were 30 states; affecting cohorts born in 1953. By 1974, it was all 50 but 2 states; affecting cohorts born in 1956.

    1. My first problem is that I do not know how to create the binary variable for each state of birth and year of birth combination, in order for them to take on the value of =1. I tried doing this for the cohort born in 1951 in the 9 states that had the nonrestrictive laws in 1969:
    gen P = 0
    replace P = 1 if birthyr == 1951 & bpl == 5 & bpl == 6 & bpl == 13 & bpl == 16 & bpl == 21 & bpl == 28 & bpl == 32 & bpl == 40 & bpl == 49

    However, I think there is a more efficient way of generating such a binary variable?

    2. Because the state law changes started in 1969, then 1971, then 1974 -- do I just take into consideration the final state laws in 1974 when generating the binary variable?

    3. By 1974, all but 2 states have the nonrestrictive birth control law; is there a more precise and efficient way for me to generate the binary variable without doing what I did in question (1)?

    Hope I have included enough information and that my questions come across clear.




  • #2
    You definitely should use the actual year of law change in each state, note just use 1974 across the board. Even doing that, you will have some substantial, and perhaps differential misclassification error because just because a person was born in a state, that is no guarantee that he or she still lives there at age 18. Most do, but a substantial number will have moved. But let's leave that asside and assume that you cannot get actual location data on people at age 18.

    So what you need to do is make a new dataset with 50 observations (1 for each state) and 2 variables. The first variable is the state, using the exact same numerical coding as in the IPUMS, and the second variable is the year in which they first adopted a nonrestrictive law (let's call that variable lc_year). For the two states that never did, just put a missing value. You may be able to create this by importing it from whatever form the data is currently in, or if all you have is maybe a table in a PDF, then you might have to just create the data set by typing the information into the Stata Editor. Anyway, create that Stata data set and save it. Let's say you name it law_change_years.dta. Then from there you can do this:

    Code:
    use IPUMS_dataset, clear
    merge m:1 state using law_change_years
    gen byte P = (birth_year + 18 >= lc_year)
    In the future, when asking for help with code, it is always best to show example data. In this case, I was able to infer enough about your data set to write some code that, perhaps with some changes of names, will probably work. But writing code for imaginary data is a dubious enterprise most of the time. So always help those who want to help you by showing an example of your data, and the helpful way to do that is by using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Clyde Schechter Thank you! This has worked for me. Also I couldn't figure out how to use dataex but I do now. I will definitely keep that in mind for future reference.

      Comment

      Working...
      X