Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Findings events within a date range

    Hi there, I'm using Stata 18 on Windows 11 Pro. I'm trying to do two things: 1) create a MM-YYYY or YYYY-MM (order doesn't matter) from two variables (ppbeg_mnth, created from a month variable, dobmm) and (ppbeg_yr, created from a year variable, dobyy) and 2) from this, write a command that enables me to identify whether a given event falls within a YEAR in the combined variable of #1.

    Any advice or guidance would be greatly appreciated!

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(dobyy dobmm) float(ppbeg_mnth ppbeg_year)
    1962 11 11 1962
    1959 10 10 1959
    1992  6  6 1992
    1985 10 10 1985
    2012  3  3 2012
    2011  1  1 2011
    1988  1  1 1988
    1990  1  1 1990
    2014 10 10 2014
    1970 11 11 1970
    1966  8  8 1966
    2003  1  1 2003
    2005  7  7 2005
    1954  1  1 1954
    1961 11 11 1961
    1988 10 10 1988
    1983  9  9 1983
    2010  5  5 2010
    1988 10 10 1988
    1962  8  8 1962
    1955  5  5 1955
    1979  8  8 1979
    1977  4  4 1977
    2009  1  1 2009
    2011  2  2 2011
    1997  1  1 1997
    1999 11 11 1999
    1970 12 12 1970
    1995  5  5 1995
    1963  1  1 1963
    1945  6  6 1945
    2007  3  3 2007
    2003  5  5 2003
    1977  4  4 1977
    2014  6  6 2014
    1982  2  2 1982
    1976  7  7 1976
    2001  5  5 2001
    2007  6  6 2007
    1940  8  8 1940
    1949  7  7 1949
    1961  2  2 1961
    1969  1  1 1969
    1974  5  5 1974
    1991  5  5 1991
    2013  3  3 2013
    2012  6  6 2012
    2009 10 10 2009
    2010 11 11 2010
    2015  9  9 2015
    1960  3  3 1960
    2005  2  2 2005
    1980  7  7 1980
    2003 10 10 2003
    2010  8  8 2010
    1974  6  6 1974
    1935  3  3 1935
    1933  9  9 1933
    1948  6  6 1948
    1966  5  5 1966
    1930 12 12 1930
    1949  3  3 1949
    1930  5  5 1930
    1932  2  2 1932
    1933 10 10 1933
    1961  4  4 1961
    1989  6  6 1989
    1952  8  8 1952
    1976  2  2 1976
    1963  4  4 1963
    2012  1  1 2012
    1985  7  7 1985
    1988  3  3 1988
    2012  1  1 2012
    2009  1  1 2009
    1977  4  4 1977
    1978  8  8 1978
    1961  2  2 1961
    2001 12 12 2001
    1997  9  9 1997
    1959  5  5 1959
    1970 10 10 1970
    1965  8  8 1965
    1997  7  7 1997
    1968  8  8 1968
    1968  2  2 1968
    1998  1  1 1998
    2003  6  6 2003
    1944  7  7 1944
    2000 10 10 2000
    1945  7  7 1945
    1985  8  8 1985
    1989 10 10 1989
    1990 12 12 1990
    2007  7  7 2007
    1970  9  9 1970
    1986 11 11 1986
    2009 12 12 2009
    2014  9  9 2014
    1964 12 12 1964
    end
    label values dobyy H1810223X
    label values dobmm H1810222X
    label def H1810222X 1 "1 JANUARY", modify
    label def H1810222X 2 "2 FEBRUARY", modify
    label def H1810222X 3 "3 MARCH", modify
    label def H1810222X 4 "4 APRIL", modify
    label def H1810222X 5 "5 MAY", modify
    label def H1810222X 6 "6 JUNE", modify
    label def H1810222X 7 "7 JULY", modify
    label def H1810222X 8 "8 AUGUST", modify
    label def H1810222X 9 "9 SEPTEMBER", modify
    label def H1810222X 10 "10 OCTOBER", modify
    label def H1810222X 11 "11 NOVEMBER", modify
    label def H1810222X 12 "12 DECEMBER", modify

  • #2
    So, you can create Stata monthly date variables for both dobn and ppbeg as follows:

    Code:
    gen month_of_birth = ym(dobyy, dobmm)
    assert missing(month_of_birth) == missing(dobyy, dobmm)
    
    gen month_of_ppbeg = ym(ppbeg_year, ppbeg_mnth)
    assert missing(month_of_ppbeg) == missing(ppbeg_year, ppbeg_mnth)
    
    format month_of* %tmNN-CCYY
    That said, there is something odd about your data because in all cases ppbeg_mnth == dobmm, and ppbeg_year == dobyy. Why do you have two copies of the same information? This is either a data error or very inefficient data organization.

    AlsoI do not understand your second question. First of all, what "event" are you talking about? Show example data that contains it. Also, given that ppbeg or birth are only given accurate to a month, it is not possible to say if some other date falls within a year of it. The best you can do is decide whether the months differ by at most 12.

    Comment


    • #3
      Hi Clyde, thank you so much for your help and support. You're completely right -- the duplicate copies of those variables in my -dataex- was a perfect example of inefficient data organization. I accidentally included variables used for internal validity checks.

      Regarding my second question, I'm merging data that looks like this (where every row is a person uniquely identified by dupersid with a one-year timeframe associated with them - documented by ppbeg and ppend):

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str8 dupersid float(ppbeg ppend)
      "60029101" 668 680
      "60113102" 670 682
      "60229102" 662 674
      "60242101" 665 677
      "60299101" 667 679
      "60634104" 667 679
      "60698103" 661 673
      "60701102" 663 675
      "60821101" 670 682
      "60827102" 669 681
      "60918101" 663 675
      "60964101" 664 676
      "60977101" 665 677
      "61123102" 667 679
      "61219101" 662 674
      "61383101" 669 681
      "61527101" 666 678
      "61534103" 661 673
      "61541103" 665 677
      "61553101" 671 683
      "61760102" 665 677
      "61791102" 667 679
      "61800103" 665 677
      "61961104" 667 679
      "61976104" 664 676
      "61985102" 661 673
      "62079102" 662 674
      "62083102" 671 683
      "62100102" 661 673
      "62130102" 665 677
      "62344102" 667 679
      "62444102" 667 679
      "62454201" 662 674
      "62577102" 663 675
      "62598102" 662 674
      "62600101" 666 678
      "62716101" 662 674
      "62813104" 666 678
      "62943102" 670 682
      "63008102" 666 678
      "63113102" 671 683
      "63125104" 667 679
      "63153203" 663 675
      "63167101" 660 672
      "63307102" 669 681
      "63314101" 663 675
      "63323101" 662 674
      "63354101" 664 676
      "63372102" 664 676
      "63451201" 665 677
      "63493104" 664 676
      "63517106" 662 674
      "63805105" 662 674
      "63829103" 671 683
      "63854101" 660 672
      "63866102" 665 677
      "63881103" 662 674
      "63950105" 664 676
      "64018102" 668 680
      "64110101" 667 679
      "64114102" 661 673
      "64122101" 661 673
      "64168102" 671 683
      "64221103" 666 678
      "64233102" 660 672
      "64277101" 662 674
      "64314101" 670 682
      "64428102" 665 677
      "64469102" 661 673
      "64691102" 669 681
      "64817102" 669 681
      "64840103" 662 674
      "64851102" 666 678
      "64884102" 670 682
      "64899102" 665 677
      "64913102" 669 681
      "64991102" 667 679
      "65044106" 660 672
      "65075103" 665 677
      "65132102" 664 676
      "65163102" 667 679
      "65167104" 669 681
      "65291104" 665 677
      "65337202" 667 679
      "65347101" 660 672
      "65368102" 665 677
      "65371101" 663 675
      "65408104" 664 676
      "65497102" 665 677
      "65525106" 665 677
      "65607101" 660 672
      "65617101" 660 672
      "65626104" 662 674
      "65676102" 664 676
      "65780102" 671 683
      "65836101" 661 673
      "65840101" 667 679
      "65851102" 671 683
      "65996102" 665 677
      "66014102" 663 675
      end
      format %tmNN-CCYY ppbeg
      format %tmNN-CCYY ppend
      I'll be doing a 1:m merge by dupersid to a stacked event file, which I've formatted to be like this (evntidx is an event-level unique ID):

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str8 dupersid str12 evntidx double(eventdateyr eventdatemm) str2 data
      "60001105" "600011050021" 2015  6 "IP"
      "60010106" "600101060051" 2015 11 "IP"
      "60015101" "600151010551" 2015 12 "IP"
      "60020101" "600201010031" 2015 12 "IP"
      "60029101" "600291010101" 2015  9 "IP"
      "60050101" "600501010011" 2015  2 "IP"
      "60056106" "600561060031" 2015  6 "IP"
      "60056106" "600561060081" 2015 12 "IP"
      "60082101" "600821010071" 2015  5 "IP"
      "60086102" "600861020021" 2015  1 "IP"
      "60086102" "600861020061" 2015  6 "IP"
      "60102102" "601021020121" 2015  2 "IP"
      "60102102" "601021020221" 2015  3 "IP"
      "60102102" "601021020281" 2015  8 "IP"
      "60113102" "601131020231" 2015 11 "IP"
      "60114101" "601141010131" 2015 12 "IP"
      "60117105" "601171050211" 2015  3 "IP"
      "60136101" "601361010151" 2015  3 "IP"
      "60140101" "601401010491" 2015  8 "IP"
      "60141101" "601411011371" 2015  2 "IP"
      "60141101" "601411011381" 2015  5 "IP"
      "60154101" "601541010091" 2015 11 "IP"
      "60161101" "601611010361" 2015 11 "IP"
      "60166102" "601661020211" 2015  2 "IP"
      "60166102" "601661020221" 2015  3 "IP"
      "60166102" "601661020241" 2015  2 "IP"
      "60172101" "601721010371" 2015 10 "IP"
      "60191105" "601911050051" 2015  4 "IP"
      "60193102" "601931020341" 2015  2 "IP"
      "60193102" "601931020621" 2015  5 "IP"
      "60197101" "601971010041" 2015  1 "IP"
      "60197101" "601971010071" 2015  2 "IP"
      "60197102" "601971020121" 2015  2 "IP"
      "60197102" "601971020131" 2015  2 "IP"
      "60210102" "602101020961" 2015  7 "IP"
      "60214101" "602141010321" 2015  7 "IP"
      "60217101" "602171010361" 2015  8 "IP"
      "60224102" "602241020411" 2015  9 "IP"
      "60229102" "602291020031" 2015  3 "IP"
      "60242101" "602421010211" 2015  6 "IP"
      "60242103" "602421030041" 2015 12 "IP"
      "60246102" "602461020432" 2015  3 "IP"
      "60246102" "602461020702" 2015  8 "IP"
      "60252101" "602521013271" 2015  8 "IP"
      "60264101" "602641010551" 2015 11 "IP"
      "60268101" "602681010641" 2015  4 "IP"
      "60279102" "602791020081" 2015  4 "IP"
      "60280102" "602801020041" 2015 11 "IP"
      "60299101" "602991010341" 2015  8 "IP"
      "60301102" "603011020191" 2015  9 "IP"
      "60301102" "603011020201" 2015 10 "IP"
      "60308101" "603081010221" 2015  3 "IP"
      "60310101" "603101010451" 2015  1 "IP"
      "60310101" "603101010741" 2015  4 "IP"
      "60339101" "603391010021" 2015 11 "IP"
      "60355102" "603551021441" 2015 10 "IP"
      "60357101" "603571010551" 2015 11 "IP"
      "60373101" "603731010021" 2015 11 "IP"
      "60384101" "603841010201" 2015 10 "IP"
      "60388101" "603881010961" 2015 10 "IP"
      "60415101" "604151010441" 2015 10 "IP"
      "60415101" "604151010451" 2015 10 "IP"
      "60415101" "604151010461" 2015 11 "IP"
      "60415101" "604151010711" 2015  9 "IP"
      "60415101" "604151011081" 2015 11 "IP"
      "60449103" "604491030471" 2015  8 "IP"
      "60449103" "604491030481" 2015  8 "IP"
      "60449103" "604491030491" 2015  7 "IP"
      "60449103" "604491030501" 2015  5 "IP"
      "60459101" "604591010341" 2015  9 "IP"
      "60480101" "604801010301" 2015  9 "IP"
      "60491102" "604911020301" 2015 10 "IP"
      "60495102" "604951020321" 2015  4 "IP"
      "60495102" "604951020521" 2015  8 "IP"
      "60500102" "605001020101" 2015  1 "IP"
      "60500102" "605001020111" 2015  1 "IP"
      "60500102" "605001020121" 2015  2 "IP"
      "60500102" "605001020131" 2015  3 "IP"
      "60510103" "605101030281" 2015 11 "IP"
      "60532101" "605321010571" 2015  6 "IP"
      "60532101" "605321010581" 2015  8 "IP"
      "60532101" "605321010641" 2015 11 "IP"
      "60550101" "605501011071" 2015  9 "IP"
      "60564101" "605641010351" 2015  5 "IP"
      "60569103" "605691030261" 2015  4 "IP"
      "60595101" "605951010181" 2015 12 "IP"
      "60613101" "606131010211" 2015  6 "IP"
      "60613102" "606131020321" 2015  6 "IP"
      "60613102" "606131020331" 2015  6 "IP"
      "60634104" "606341040102" 2015  8 "IP"
      "60643103" "606431030291" 2015  3 "IP"
      "60654102" "606541020111" 2015  9 "IP"
      "60662102" "606621020491" 2015 11 "IP"
      "60698103" "606981030191" 2015  2 "IP"
      "60701102" "607011020211" 2015  4 "IP"
      "60702101" "607021010401" 2015  6 "IP"
      "60702101" "607021010731" 2015  8 "IP"
      "60702102" "607021020821" 2015  4 "IP"
      "60702102" "607021021171" 2015 12 "IP"
      "60703101" "607031010501" 2015 10 "IP"
      end
      label values eventdateyr H178D0009X
      label def H178D0009X 2015 "2015", modify
      label values eventdatemm H178D0008X
      After that merge, I'd like to create an indicator variable that gives me a "1" if the eventdatemm + eventdateyr falls within the date range of ppbeg ppend from the first dataset. Unsure how to do this----any help would be appreciated!

      Comment


      • #4
        Code:
        use pp_datafile, clear
        merge 1:m dupersid using event_file
        gen event_mdate = ym(eventdateyr, eventdatemm)
        assert missing(event_mdate) == missing(eventdateyr, eventdatemm)
        format event_mdate %tmNN-CCYY
        
        gen byte wanted = inrange(event_mdate, ppbeg, ppend) if !missing(event_mdate, ppbeg, ppend)
        Evidently, replace the italicized parts by the actual file names.

        The - if !missing(event_mdate, ppbeg, ppend)- term tacked on to the end of the final command is needed because, at least in your example, there are many unmatched dupersid values in both data sets. For these, it is not meaningful to ask whether event_mdate falls between ppbeg and ppend, because either the first or the last two of these will not exist. Now, it may be that in your real, complete data sets, there will be no unmatched observations in the -merge-. If that is the case, you can remove this part of the code. (You can also leave it there and it will have no effect, but if your data set is very large it will noticeably slow down execution.)

        Comment

        Working...
        X