Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two datasets with specific dates of start and end of period within a year

    I am running on panel project where I have to merge and match two dataset . My first dataset is a set of years and specific dates of start and end of period within a year and the second yearly country observations. Observations are yearly. In need to match the yearly observations of my second dataset to the longest duration in days starting from January 1st of each year i my first dataset . Below I paste some parts of the datasets in order to help you understand better. You can note that in some cases more than one period occurs. Fot those periods I need to mach to the longest duration as already said from the start of each year separately. Imagine that I need to do that for fifty years for around 45 countries and you get an idea how difficult and time consuming is doing that in excel in really Any ideas?

    Dataset 1

    Country OECD ts dm dm1 start of period End of period
    Finland oecd member eu member 2006 24/6/03 19/4/07 Finland
    Finland oecd member eu member 2007 24/6/03 19/4/07 Finland
    Finland oecd member eu member 2007 39159 200703 19/4/07 22/6/10 Finland
    Finland oecd member eu member 2008 19/4/07 22/6/10 Finland
    Finland oecd member eu member 2009 19/4/07 22/6/10 Finland
    Finland oecd member eu member 2010 19/4/07 22/6/10 Finland
    Finland oecd member eu member 2010 22/6/10 22/6/11 Finland
    Finland oecd member eu member 2011 22/6/10 22/6/11 Finland
    Finland oecd member eu member 2011 40650 201104 22/6/11 25/3/14 Finland
    Finland oecd member eu member 2012 22/6/11 25/3/14 Finland
    Finland oecd member eu member 2013 22/6/11 25/3/14 Finland
    Finland oecd member eu member 2014 22/6/11 25/3/14 Finland
    Finland oecd member eu member 2014 25/3/14 24/6/14 Finland
    Finland oecd member eu member 2014 24/6/14 26/9/14 Finland
    Finland oecd member eu member 2014 97 26/9/14 29/5/15 Finland
    Finland oecd member eu member 2015 42113 201504 29/5/15 31/12/18 Finland
    Finland oecd member eu member 2015 29/5/15 31/12/18 Finland
    Finland oecd member eu member 2016 29/5/15 31/12/18 Finland

    Dataset 2

    Country Country symbol year year code var1 var2 var3
    Finland FIN 2006 YR2006 0,623874 2,002701 8,384
    Finland FIN 2007 YR2007 1,566664 2,006751 7,719
    Finland FIN 2008 YR2008 2,510666 2,010769 6,854
    Finland FIN 2009 YR2009 4,065954 2,017309 6,369
    Finland FIN 2010 YR2010 -9,2E-07 2 8,249
    Finland FIN 2011 YR2011 1,184135 2,005112 8,394
    Finland FIN 2012 YR2012 3,416808 2,014591 7,781
    Finland FIN 2013 YR2013 2,808336 2,012028 7,689
    Finland FIN 2014 YR2014 1,478286 2,006373 8,193
    Finland FIN 2015 YR2015 1,041196 2,004498 8,663

  • #2
    Ok some people told that can be done by creating a panel data set from my first dataset using expand and by creating a new time indicator variable. Then make a m:1 merge.
    Anyone one on this And what's the command?

    Comment


    • #3
      The tableaus you have shown to illustrate your data do not come from Stata data sets: they can't because variable names cannot contain blank spaces.

      If you have not yet imported your data into Stata, then it is premature to ask for help with code (unless you need help with code for the importation itself). If you do have Stata data sets, then, as the FAQ (which everyone is asked to read before pasting) say, this kind of data tableau is not particularly helpful to those who might try to work out some code for you. Instead, you should use the -dataex- command to post your example data.
      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      When asking for help with code, always show example data. When showing example data, always use -dataex-.

      That said, I also do not understand what you mean by

      If you post back with usable examples of your data "
      match the yearly observations of my second dataset to the longest duration in days starting from January 1st of each year i my first dataset
      ." When you post back, point out examples of some observations that do match and some observations that don't, as well as trying to explain more clearly what you need.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        The tableaus you have shown to illustrate your data do not come from Stata data sets: they can't because variable names cannot contain blank spaces.

        If you have not yet imported your data into Stata, then it is premature to ask for help with code (unless you need help with code for the importation itself). If you do have Stata data sets, then, as the FAQ (which everyone is asked to read before pasting) say, this kind of data tableau is not particularly helpful to those who might try to work out some code for you. Instead, you should use the -dataex- command to post your example data.
        In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use -dataex-.

        That said, I also do not understand what you mean by

        If you post back with usable examples of your data "
        match the yearly observations of my second dataset to the longest duration in days starting from January 1st of each year i my first dataset
        ." When you post back, point out examples of some observations that do match and some observations that don't, as well as trying to explain more clearly what you need.
        Thanks for the reply.I am new to the forums.

        Below I paste some part of the data .I have to merge these two dataset.As you may noticed there are multiple dates for some years in the first dataset. I need to match the yearly observations of my second dataset to the longest duration in days starting from January 1st of each year in my first dataset .For example take the year 1990 . You will notice that in that year there are two entries of 1990. The longest duration in days in that year is the second entry 4/4/1990 to 27/12/1991 ( or in my case of interest to 31/12/1990, end of year). I would like to merge the observation of my second dataset for 1990 to that date for 1990 in my first dataset(the longest duration)

        Consider that this is panel for fifty years for around 45 countries and you get an idea .

        Some people suggested me to create a panel date set from my first dataset using expand and by creating a new time indicator variable. Than make a m:1 merge. but did not get how

        I hope now is clear and appreciate any help you can provide.

        Daset1
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
        "Australia" 1990     .      . 3 6 "Australia" 1987  7 22 1 1 3 3
        "Australia" 1990     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
        "Australia" 1991     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
        "Australia" 1991     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
        "Australia" 1992     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
        "Australia" 1993     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
        "Australia" 1993 34041 199303 4 1 "Australia" 1993  3 24 1 1 1 1
        "Australia" 1994     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
        "Australia" 1995     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
        "Australia" 1996 35126 199603 4 1 "Australia" 1993  3 24 1 1 1 1
        "Australia" 1996     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
        "Australia" 1997     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
        "Australia" 1998 36071 199810 1 2 "Australia" 1996  3 11 3 3 4 4
        "Australia" 1998     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
        "Australia" 1999     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
        "Australia" 2000     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
        end
        label values start start
        label def start 1 "11/3/96", modify
        label def start 2 "21/10/98", modify
        label def start 3 "22/7/87", modify
        label def start 4 "24/3/93", modify
        label def start 5 "27/12/91", modify
        label def start 6 "4/4/90", modify
        label values end end
        label def end 1 "11/3/96", modify
        label def end 2 "21/10/98", modify
        label def end 3 "24/3/93", modify
        label def end 4 "26/11/01", modify
        label def end 5 "27/12/91", modify
        label def end 6 "4/4/90", modify
        label values pr pr
        label def pr 1 "-0,165", modify
        label def pr 2 "-14,9", modify
        label def pr 3 "-4,5", modify
        label def pr 4 "22,593", modify
        label def pr 5 "48,458", modify
        label values gr gr
        label def gr 1 "-0,165", modify
        label def gr 2 "-14,9", modify
        label def gr 3 "-4,5", modify
        label def gr 4 "22,593", modify
        label def gr 5 "48,458", modify

        Dataset 2
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
        "Australia" "AUS" 1990 "YR1990" 11  5
        "Australia" "AUS" 1991 "YR1991"  8 10
        "Australia" "AUS" 1992 "YR1992"  3  1
        "Australia" "AUS" 1993 "YR1993"  5  2
        "Australia" "AUS" 1994 "YR1994"  6 11
        "Australia" "AUS" 1995 "YR1995" 10  8
        "Australia" "AUS" 1996 "YR1996"  7  9
        "Australia" "AUS" 1997 "YR1997"  1  7
        "Australia" "AUS" 1998 "YR1998"  2  6
        "Australia" "AUS" 1999 "YR1999"  4  4
        "Australia" "AUS" 2000 "YR2000"  9  3
        end
        label values rate1 rate1
        label def rate1 1 "0,224887556", modify
        label def rate1 2 "0,86013463", modify
        label def rate1 3 "1,012231126", modify
        label def rate1 4 "1,483129403", modify
        label def rate1 5 "1,753653445", modify
        label def rate1 6 "1,969634797", modify
        label def rate1 7 "2,615384615", modify
        label def rate1 8 "3,17667537", modify
        label def rate1 9 "4,457435148", modify
        label def rate1 10 "4,6277666", modify
        label def rate1 11 "7,333021952", modify
        label values rate2 rate2
        label def rate2 1 "10,72879982", modify
        label def rate2 2 "10,87380028", modify
        label def rate2 3 "6,282599926", modify
        label def rate2 4 "6,872300148", modify
        label def rate2 5 "6,926000118", modify
        label def rate2 6 "7,675600052", modify
        label def rate2 7 "8,362199783", modify
        label def rate2 8 "8,46930027", modify
        label def rate2 9 "8,506199837", modify
        label def rate2 10 "9,579199791", modify
        label def rate2 11 "9,718700409", modify

        Comment


        • #5
          Alright, this is somewhat much, but we still have a way to go.

          Before we can even approach the matching you want, the date variables in the first data set and the rate variables in the second are useless as they are. What you have there are value-labeled integers masqsuerading as dates and rates. This is a sure recipe for disaster when you try to do any calculations with them. So before we can even think about how these datasets might match up, we have to change the dates to real date variables, and the rates to real numbers.

          Here is the code to do that much:

          Code:
          // FOR DATASET 1
          clear*
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
          "Australia" 1990     .      . 3 6 "Australia" 1987  7 22 1 1 3 3
          "Australia" 1990     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
          "Australia" 1991     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
          "Australia" 1991     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
          "Australia" 1992     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
          "Australia" 1993     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
          "Australia" 1993 34041 199303 4 1 "Australia" 1993  3 24 1 1 1 1
          "Australia" 1994     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
          "Australia" 1995     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
          "Australia" 1996 35126 199603 4 1 "Australia" 1993  3 24 1 1 1 1
          "Australia" 1996     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
          "Australia" 1997     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
          "Australia" 1998 36071 199810 1 2 "Australia" 1996  3 11 3 3 4 4
          "Australia" 1998     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
          "Australia" 1999     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
          "Australia" 2000     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
          end
          label values start start
          label def start 1 "11/3/96", modify
          label def start 2 "21/10/98", modify
          label def start 3 "22/7/87", modify
          label def start 4 "24/3/93", modify
          label def start 5 "27/12/91", modify
          label def start 6 "4/4/90", modify
          label values end end
          label def end 1 "11/3/96", modify
          label def end 2 "21/10/98", modify
          label def end 3 "24/3/93", modify
          label def end 4 "26/11/01", modify
          label def end 5 "27/12/91", modify
          label def end 6 "4/4/90", modify
          label values pr pr
          label def pr 1 "-0,165", modify
          label def pr 2 "-14,9", modify
          label def pr 3 "-4,5", modify
          label def pr 4 "22,593", modify
          label def pr 5 "48,458", modify
          label values gr gr
          label def gr 1 "-0,165", modify
          label def gr 2 "-14,9", modify
          label def gr 3 "-4,5", modify
          label def gr 4 "22,593", modify
          label def gr 5 "48,458", modify
          frame rename default dataset1
          
          //  CREATE USABLE DATE VARIABLES
          drop start
          gen start = mdy(startmonth, startday, startyear)
          assert missing(start) == missing(startmonth, startday, startyear)
          format start %td
          decode end, gen(_end)
          drop end
          gen end = daily(_end, "DMY", 2020)
          assert missing(end) == missing(_end)
          format end %td
          drop _end startmonth startday startyear // NO LONGER NEEDED
          gen duration = end - start
          Code:
          //  FOR DATASET 2
          
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
          "Australia" "AUS" 1990 "YR1990" 11  5
          "Australia" "AUS" 1991 "YR1991"  8 10
          "Australia" "AUS" 1992 "YR1992"  3  1
          "Australia" "AUS" 1993 "YR1993"  5  2
          "Australia" "AUS" 1994 "YR1994"  6 11
          "Australia" "AUS" 1995 "YR1995" 10  8
          "Australia" "AUS" 1996 "YR1996"  7  9
          "Australia" "AUS" 1997 "YR1997"  1  7
          "Australia" "AUS" 1998 "YR1998"  2  6
          "Australia" "AUS" 1999 "YR1999"  4  4
          "Australia" "AUS" 2000 "YR2000"  9  3
          end
          label values rate1 rate1
          label def rate1 1 "0,224887556", modify
          label def rate1 2 "0,86013463", modify
          label def rate1 3 "1,012231126", modify
          label def rate1 4 "1,483129403", modify
          label def rate1 5 "1,753653445", modify
          label def rate1 6 "1,969634797", modify
          label def rate1 7 "2,615384615", modify
          label def rate1 8 "3,17667537", modify
          label def rate1 9 "4,457435148", modify
          label def rate1 10 "4,6277666", modify
          label def rate1 11 "7,333021952", modify
          label values rate2 rate2
          label def rate2 1 "10,72879982", modify
          label def rate2 2 "10,87380028", modify
          label def rate2 3 "6,282599926", modify
          label def rate2 4 "6,872300148", modify
          label def rate2 5 "6,926000118", modify
          label def rate2 6 "7,675600052", modify
          label def rate2 7 "8,362199783", modify
          label def rate2 8 "8,46930027", modify
          label def rate2 9 "8,506199837", modify
          label def rate2 10 "9,579199791", modify
          label def rate2 11 "9,718700409", modify
          
          //  CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
          foreach v of varlist rate1 rate2 {
              decode `v', gen(_`v')
              drop `v'
              rename _`v' `v'
              destring `v', dpcomma replace
          }
          But from that point on, I'm still confused about what you want. You say:

          I need to match the yearly observations of my second dataset to the longest duration in days starting from January 1st of each year in my first dataset .For example take the year 1990 . You will notice that in that year there are two entries of 1990. The longest duration in days in that year is the second entry 4/4/1990 to 27/12/1991 ( or in my case of interest to 31/12/1990, end of year). I would like to merge the observation of my second dataset for 1990 to that date for 1990 in my first dataset(the longest duration)

          But the data are different from what you describe. There are indeed two observations for year 1990 (I assume here by year you are referring to the variable called ts, not the one called startyear. If you mean startyear, it's still not what you say it is.) But eh duration of the first observation is 987 days, whereas the second is only 632, so it should be the first observation you want to merge with. You also refer to your "interest in 31/12/1990" but that date doesn't appear anywhere in the data at all. What does 31/12/1990 have to do with your desired merge process? You also refer to starting from January 1, but, again, I don't understand where that comes in at all.

          I also have another question that you don't address: does the country (or ccode) play any role in the merge? Do you want to match up only observations referring to the same country? Or can an observation from Australia be acceptably matched with an observation from Tanzania provided the dates meet your criterion?

          As an aside, you also have variables called edate and date. When I apply date formats to see what they are, they are all in the very remote future. So I think there is something wrong with those values.


          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Alright, this is somewhat much, but we still have a way to go.

            Before we can even approach the matching you want, the date variables in the first data set and the rate variables in the second are useless as they are. What you have there are value-labeled integers masqsuerading as dates and rates. This is a sure recipe for disaster when you try to do any calculations with them. So before we can even think about how these datasets might match up, we have to change the dates to real date variables, and the rates to real numbers.

            Here is the code to do that much:

            Code:
            // FOR DATASET 1
            clear*
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
            "Australia" 1990 . . 3 6 "Australia" 1987 7 22 1 1 3 3
            "Australia" 1990 . . 6 5 "Australia" 1990 4 4 1 1 2 2
            "Australia" 1991 . . 6 5 "Australia" 1990 4 4 1 1 2 2
            "Australia" 1991 . . 5 3 "Australia" 1991 12 27 1 1 2 2
            "Australia" 1992 . . 5 3 "Australia" 1991 12 27 1 1 2 2
            "Australia" 1993 . . 5 3 "Australia" 1991 12 27 1 1 2 2
            "Australia" 1993 34041 199303 4 1 "Australia" 1993 3 24 1 1 1 1
            "Australia" 1994 . . 4 1 "Australia" 1993 3 24 1 1 1 1
            "Australia" 1995 . . 4 1 "Australia" 1993 3 24 1 1 1 1
            "Australia" 1996 35126 199603 4 1 "Australia" 1993 3 24 1 1 1 1
            "Australia" 1996 . . 1 2 "Australia" 1996 3 11 3 3 4 4
            "Australia" 1997 . . 1 2 "Australia" 1996 3 11 3 3 4 4
            "Australia" 1998 36071 199810 1 2 "Australia" 1996 3 11 3 3 4 4
            "Australia" 1998 . . 2 4 "Australia" 1998 10 21 2 2 5 5
            "Australia" 1999 . . 2 4 "Australia" 1998 10 21 2 2 5 5
            "Australia" 2000 . . 2 4 "Australia" 1998 10 21 2 2 5 5
            end
            label values start start
            label def start 1 "11/3/96", modify
            label def start 2 "21/10/98", modify
            label def start 3 "22/7/87", modify
            label def start 4 "24/3/93", modify
            label def start 5 "27/12/91", modify
            label def start 6 "4/4/90", modify
            label values end end
            label def end 1 "11/3/96", modify
            label def end 2 "21/10/98", modify
            label def end 3 "24/3/93", modify
            label def end 4 "26/11/01", modify
            label def end 5 "27/12/91", modify
            label def end 6 "4/4/90", modify
            label values pr pr
            label def pr 1 "-0,165", modify
            label def pr 2 "-14,9", modify
            label def pr 3 "-4,5", modify
            label def pr 4 "22,593", modify
            label def pr 5 "48,458", modify
            label values gr gr
            label def gr 1 "-0,165", modify
            label def gr 2 "-14,9", modify
            label def gr 3 "-4,5", modify
            label def gr 4 "22,593", modify
            label def gr 5 "48,458", modify
            frame rename default dataset1
            
            // CREATE USABLE DATE VARIABLES
            drop start
            gen start = mdy(startmonth, startday, startyear)
            assert missing(start) == missing(startmonth, startday, startyear)
            format start %td
            decode end, gen(_end)
            drop end
            gen end = daily(_end, "DMY", 2020)
            assert missing(end) == missing(_end)
            format end %td
            drop _end startmonth startday startyear // NO LONGER NEEDED
            gen duration = end - start
            Code:
            // FOR DATASET 2
            
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
            "Australia" "AUS" 1990 "YR1990" 11 5
            "Australia" "AUS" 1991 "YR1991" 8 10
            "Australia" "AUS" 1992 "YR1992" 3 1
            "Australia" "AUS" 1993 "YR1993" 5 2
            "Australia" "AUS" 1994 "YR1994" 6 11
            "Australia" "AUS" 1995 "YR1995" 10 8
            "Australia" "AUS" 1996 "YR1996" 7 9
            "Australia" "AUS" 1997 "YR1997" 1 7
            "Australia" "AUS" 1998 "YR1998" 2 6
            "Australia" "AUS" 1999 "YR1999" 4 4
            "Australia" "AUS" 2000 "YR2000" 9 3
            end
            label values rate1 rate1
            label def rate1 1 "0,224887556", modify
            label def rate1 2 "0,86013463", modify
            label def rate1 3 "1,012231126", modify
            label def rate1 4 "1,483129403", modify
            label def rate1 5 "1,753653445", modify
            label def rate1 6 "1,969634797", modify
            label def rate1 7 "2,615384615", modify
            label def rate1 8 "3,17667537", modify
            label def rate1 9 "4,457435148", modify
            label def rate1 10 "4,6277666", modify
            label def rate1 11 "7,333021952", modify
            label values rate2 rate2
            label def rate2 1 "10,72879982", modify
            label def rate2 2 "10,87380028", modify
            label def rate2 3 "6,282599926", modify
            label def rate2 4 "6,872300148", modify
            label def rate2 5 "6,926000118", modify
            label def rate2 6 "7,675600052", modify
            label def rate2 7 "8,362199783", modify
            label def rate2 8 "8,46930027", modify
            label def rate2 9 "8,506199837", modify
            label def rate2 10 "9,579199791", modify
            label def rate2 11 "9,718700409", modify
            
            // CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
            foreach v of varlist rate1 rate2 {
            decode `v', gen(_`v')
            drop `v'
            rename _`v' `v'
            destring `v', dpcomma replace
            }
            But from that point on, I'm still confused about what you want. You say:


            But the data are different from what you describe. There are indeed two observations for year 1990 (I assume here by year you are referring to the variable called ts, not the one called startyear. If you mean startyear, it's still not what you say it is.) But eh duration of the first observation is 987 days, whereas the second is only 632, so it should be the first observation you want to merge with. You also refer to your "interest in 31/12/1990" but that date doesn't appear anywhere in the data at all. What does 31/12/1990 have to do with your desired merge process? You also refer to starting from January 1, but, again, I don't understand where that comes in at all.

            I also have another question that you don't address: does the country (or ccode) play any role in the merge? Do you want to match up only observations referring to the same country? Or can an observation from Australia be acceptably matched with an observation from Tanzania provided the dates meet your criterion?

            As an aside, you also have variables called edate and date. When I apply date formats to see what they are, they are all in the very remote future. So I think there is something wrong with those values.[/FONT][/COLOR][/LEFT]

            Thanks for the input . By ts and Year I refer to time (year). The references are from their original sources where they were downloaded from.

            The dataset 1 refers to the start day and end day of the actual dates of governments in office while dataset2 refers to some sort of indexes(I provide only 2 but there are around 16 more of them).Consider that these data are only a small part of a panel for multiple years for multiple countries. I consider each year and country separately according to the actual duration in days of each incumbent in office for that particular year (365 or 366 days in a year depending on the leap year). I used in the explanation above the January to December 31 in order to make you better understand. They do not appear directly in the data. .I hope you get the point.

            I want to match up only observations referring to the same country for that year (1990 to 1990 etc) without mixing other counties (Tanzania in your example) to the longest duration of a government in office in that year. In some years more than one governments occurred .There are indeed two observations for year 1990 or for the years with more than one observation since there was a change in government and was coded as such. And that's a key issue The duration in office in previous years is unrelated.

            The ccode is crucial as I want to match up only observations referring to the same country.

            date stands for election date (YYYYMM) and edate for election date. It should be edate 13mar1993 and date 199303 for example for election taken place in on 13mar1993 . I guess something went wrong in excel. (Probably they were transformed to excel calendar form). I will probably need to create a dummy later on for the analysis based on those dates.

            I hope now is clear .Any assistance is appreciated

            Comment


            • #7
              OK, now I think I understand it. If I have it right, the following will do it:

              Code:
              // FOR DATASET 1
              clear*
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
              "Australia" 1990     .      . 3 6 "Australia" 1987  7 22 1 1 3 3
              "Australia" 1990     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
              "Australia" 1991     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
              "Australia" 1991     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
              "Australia" 1992     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
              "Australia" 1993     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
              "Australia" 1993 34041 199303 4 1 "Australia" 1993  3 24 1 1 1 1
              "Australia" 1994     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
              "Australia" 1995     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
              "Australia" 1996 35126 199603 4 1 "Australia" 1993  3 24 1 1 1 1
              "Australia" 1996     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
              "Australia" 1997     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
              "Australia" 1998 36071 199810 1 2 "Australia" 1996  3 11 3 3 4 4
              "Australia" 1998     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
              "Australia" 1999     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
              "Australia" 2000     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
              end
              label values start start
              label def start 1 "11/3/96", modify
              label def start 2 "21/10/98", modify
              label def start 3 "22/7/87", modify
              label def start 4 "24/3/93", modify
              label def start 5 "27/12/91", modify
              label def start 6 "4/4/90", modify
              label values end end
              label def end 1 "11/3/96", modify
              label def end 2 "21/10/98", modify
              label def end 3 "24/3/93", modify
              label def end 4 "26/11/01", modify
              label def end 5 "27/12/91", modify
              label def end 6 "4/4/90", modify
              label values pr pr
              label def pr 1 "-0,165", modify
              label def pr 2 "-14,9", modify
              label def pr 3 "-4,5", modify
              label def pr 4 "22,593", modify
              label def pr 5 "48,458", modify
              label values gr gr
              label def gr 1 "-0,165", modify
              label def gr 2 "-14,9", modify
              label def gr 3 "-4,5", modify
              label def gr 4 "22,593", modify
              label def gr 5 "48,458", modify
              frame rename default dataset1
              
              //  CREATE USABLE DATE VARIABLES
              drop start
              gen start = mdy(startmonth, startday, startyear)
              assert missing(start) == missing(startmonth, startday, startyear)
              format start %td
              decode end, gen(_end)
              drop end
              gen end = daily(_end, "DMY", 2020)
              assert missing(end) == missing(_end)
              format end %td
              drop _end startmonth startday startyear // NO LONGER NEEDED
              gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
              assert !missing(in_year_duration)
              
              //  KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
              //  COUNTRY AND YEAR
              by country ts (in_year_duration), sort: keep if _n == _N
              
              frame create dataset2
              frame change dataset2
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
              "Australia" "AUS" 1990 "YR1990" 11  5
              "Australia" "AUS" 1991 "YR1991"  8 10
              "Australia" "AUS" 1992 "YR1992"  3  1
              "Australia" "AUS" 1993 "YR1993"  5  2
              "Australia" "AUS" 1994 "YR1994"  6 11
              "Australia" "AUS" 1995 "YR1995" 10  8
              "Australia" "AUS" 1996 "YR1996"  7  9
              "Australia" "AUS" 1997 "YR1997"  1  7
              "Australia" "AUS" 1998 "YR1998"  2  6
              "Australia" "AUS" 1999 "YR1999"  4  4
              "Australia" "AUS" 2000 "YR2000"  9  3
              end
              label values rate1 rate1
              label def rate1 1 "0,224887556", modify
              label def rate1 2 "0,86013463", modify
              label def rate1 3 "1,012231126", modify
              label def rate1 4 "1,483129403", modify
              label def rate1 5 "1,753653445", modify
              label def rate1 6 "1,969634797", modify
              label def rate1 7 "2,615384615", modify
              label def rate1 8 "3,17667537", modify
              label def rate1 9 "4,457435148", modify
              label def rate1 10 "4,6277666", modify
              label def rate1 11 "7,333021952", modify
              label values rate2 rate2
              label def rate2 1 "10,72879982", modify
              label def rate2 2 "10,87380028", modify
              label def rate2 3 "6,282599926", modify
              label def rate2 4 "6,872300148", modify
              label def rate2 5 "6,926000118", modify
              label def rate2 6 "7,675600052", modify
              label def rate2 7 "8,362199783", modify
              label def rate2 8 "8,46930027", modify
              label def rate2 9 "8,506199837", modify
              label def rate2 10 "9,579199791", modify
              label def rate2 11 "9,718700409", modify
              
              //  CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
              foreach v of varlist rate1 rate2 {
                  decode `v', gen(_`v')
                  drop `v'
                  rename _`v' `v'
                  destring `v', dpcomma replace
              }
              isid Country year
              
              //    LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
              frlink 1:1 Country year, frame(dataset1 country ts)
              frget _all, from(dataset1)
              drop dataset1
              At the end of this code, the data in frame dataset2 is the merged result you asked for.

              There is still one point of unclarity in your request: you do not say what you want to do if there are two (or more) governments that are tied for longest in year duration. You did not indicate which one among those should be selected to receive the match. The code above makes the selection at random (and that means that if this situation occurs in your data, it may be handled differently if you re-run the code). Perhaps this situation never arises. If it does, and you have some rule for breaking the tie, post back and that can be built into the code.

              Turning to the variables date and edate, I see what date is now. This tyhpe of dat3e coding is not useful in Stata. You can sort things into chronological order with it, but you cannot compare it to real Stata date variables, nor do any calculations with it. The easiest way to make it a real Stata date variable would be:

              Code:
              gen mdate = ym(int(date/100), mod(date, 100))
              assert missing(mdate) == missing(date)
              format mdate %tm
              order mdate, before(date)
              drop date
              rename mdate date
              I don't know what to say about edate. I have no sense of what those numbers are. If I treat them as Stata internal format dates, they are in the year 2053, which can't be right. There is no obvious way to interpret the numbers 34041 and 36071 as human-readable dates. So I'll give up on that, but if you need to do anything with this variable, you need to pursue it. You won't get far at all with edate as it stands.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                OK, now I think I understand it. If I have it right, the following will do it:

                Code:
                // FOR DATASET 1
                clear*
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
                "Australia" 1990 . . 3 6 "Australia" 1987 7 22 1 1 3 3
                "Australia" 1990 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                "Australia" 1991 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                "Australia" 1991 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                "Australia" 1992 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                "Australia" 1993 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                "Australia" 1993 34041 199303 4 1 "Australia" 1993 3 24 1 1 1 1
                "Australia" 1994 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                "Australia" 1995 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                "Australia" 1996 35126 199603 4 1 "Australia" 1993 3 24 1 1 1 1
                "Australia" 1996 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                "Australia" 1997 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                "Australia" 1998 36071 199810 1 2 "Australia" 1996 3 11 3 3 4 4
                "Australia" 1998 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                "Australia" 1999 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                "Australia" 2000 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                end
                label values start start
                label def start 1 "11/3/96", modify
                label def start 2 "21/10/98", modify
                label def start 3 "22/7/87", modify
                label def start 4 "24/3/93", modify
                label def start 5 "27/12/91", modify
                label def start 6 "4/4/90", modify
                label values end end
                label def end 1 "11/3/96", modify
                label def end 2 "21/10/98", modify
                label def end 3 "24/3/93", modify
                label def end 4 "26/11/01", modify
                label def end 5 "27/12/91", modify
                label def end 6 "4/4/90", modify
                label values pr pr
                label def pr 1 "-0,165", modify
                label def pr 2 "-14,9", modify
                label def pr 3 "-4,5", modify
                label def pr 4 "22,593", modify
                label def pr 5 "48,458", modify
                label values gr gr
                label def gr 1 "-0,165", modify
                label def gr 2 "-14,9", modify
                label def gr 3 "-4,5", modify
                label def gr 4 "22,593", modify
                label def gr 5 "48,458", modify
                frame rename default dataset1
                
                // CREATE USABLE DATE VARIABLES
                drop start
                gen start = mdy(startmonth, startday, startyear)
                assert missing(start) == missing(startmonth, startday, startyear)
                format start %td
                decode end, gen(_end)
                drop end
                gen end = daily(_end, "DMY", 2020)
                assert missing(end) == missing(_end)
                format end %td
                drop _end startmonth startday startyear // NO LONGER NEEDED
                gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
                assert !missing(in_year_duration)
                
                // KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
                // COUNTRY AND YEAR
                by country ts (in_year_duration), sort: keep if _n == _N
                
                frame create dataset2
                frame change dataset2
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
                "Australia" "AUS" 1990 "YR1990" 11 5
                "Australia" "AUS" 1991 "YR1991" 8 10
                "Australia" "AUS" 1992 "YR1992" 3 1
                "Australia" "AUS" 1993 "YR1993" 5 2
                "Australia" "AUS" 1994 "YR1994" 6 11
                "Australia" "AUS" 1995 "YR1995" 10 8
                "Australia" "AUS" 1996 "YR1996" 7 9
                "Australia" "AUS" 1997 "YR1997" 1 7
                "Australia" "AUS" 1998 "YR1998" 2 6
                "Australia" "AUS" 1999 "YR1999" 4 4
                "Australia" "AUS" 2000 "YR2000" 9 3
                end
                label values rate1 rate1
                label def rate1 1 "0,224887556", modify
                label def rate1 2 "0,86013463", modify
                label def rate1 3 "1,012231126", modify
                label def rate1 4 "1,483129403", modify
                label def rate1 5 "1,753653445", modify
                label def rate1 6 "1,969634797", modify
                label def rate1 7 "2,615384615", modify
                label def rate1 8 "3,17667537", modify
                label def rate1 9 "4,457435148", modify
                label def rate1 10 "4,6277666", modify
                label def rate1 11 "7,333021952", modify
                label values rate2 rate2
                label def rate2 1 "10,72879982", modify
                label def rate2 2 "10,87380028", modify
                label def rate2 3 "6,282599926", modify
                label def rate2 4 "6,872300148", modify
                label def rate2 5 "6,926000118", modify
                label def rate2 6 "7,675600052", modify
                label def rate2 7 "8,362199783", modify
                label def rate2 8 "8,46930027", modify
                label def rate2 9 "8,506199837", modify
                label def rate2 10 "9,579199791", modify
                label def rate2 11 "9,718700409", modify
                
                // CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
                foreach v of varlist rate1 rate2 {
                decode `v', gen(_`v')
                drop `v'
                rename _`v' `v'
                destring `v', dpcomma replace
                }
                isid Country year
                
                // LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
                frlink 1:1 Country year, frame(dataset1 country ts)
                frget _all, from(dataset1)
                drop dataset1
                At the end of this code, the data in frame dataset2 is the merged result you asked for.

                There is still one point of unclarity in your request: you do not say what you want to do if there are two (or more) governments that are tied for longest in year duration. You did not indicate which one among those should be selected to receive the match. The code above makes the selection at random (and that means that if this situation occurs in your data, it may be handled differently if you re-run the code). Perhaps this situation never arises. If it does, and you have some rule for breaking the tie, post back and that can be built into the code.

                Turning to the variables date and edate, I see what date is now. This tyhpe of dat3e coding is not useful in Stata. You can sort things into chronological order with it, but you cannot compare it to real Stata date variables, nor do any calculations with it. The easiest way to make it a real Stata date variable would be:

                Code:
                gen mdate = ym(int(date/100), mod(date, 100))
                assert missing(mdate) == missing(date)
                format mdate %tm
                order mdate, before(date)
                drop date
                rename mdate date
                I don't know what to say about edate. I have no sense of what those numbers are. If I treat them as Stata internal format dates, they are in the year 2053, which can't be right. There is no obvious way to interpret the numbers 34041 and 36071 as human-readable dates. So I'll give up on that, but if you need to do anything with this variable, you need to pursue it. You won't get far at all with edate as it stands.


                Thank you a wholeheartedly for the code and work! I really appreciate it and mostly your time spent on my request.

                To start with we can drop edate entirely. No need to go further with it.

                The government that should receive the match is the government with the longest duration.The are case in my data that there have been multiple governments within a year Fortunately, the situation where there are two (or more) governments that are tied for longest in year duration never arises(same number of days)

                I have run the code and reached up to the point of the usage of the frame feature. This is a feature for Stata 16 and above, so a google search suggests. I am running Stata 15.1 and frame it's not available for this version . I am not able to see the finale results after the merging if that is right or not. So without asking much, is there another way to bypass the frame feature in order to use the code for lower version of stata?.

                Also, probably it's just me and my abilities on stata but, I did not see the use dataset2 path line in the your code and where that should enter in the code.

                Many thanks

                Mario

                Comment


                • #9
                  Yes, the Forum FAQ, which everyone is expected to read before posting, advises people that if they are not using the current version of Stata (which, presently, is 16) they should state what version they are using. Here's a rewrite of the code that does not require frames, and uses tempfiles instead.

                  Code:
                  // FOR DATASET 1
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
                  "Australia" 1990     .      . 3 6 "Australia" 1987  7 22 1 1 3 3
                  "Australia" 1990     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
                  "Australia" 1991     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
                  "Australia" 1991     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
                  "Australia" 1992     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
                  "Australia" 1993     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
                  "Australia" 1993 34041 199303 4 1 "Australia" 1993  3 24 1 1 1 1
                  "Australia" 1994     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
                  "Australia" 1995     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
                  "Australia" 1996 35126 199603 4 1 "Australia" 1993  3 24 1 1 1 1
                  "Australia" 1996     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
                  "Australia" 1997     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
                  "Australia" 1998 36071 199810 1 2 "Australia" 1996  3 11 3 3 4 4
                  "Australia" 1998     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
                  "Australia" 1999     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
                  "Australia" 2000     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
                  end
                  label values start start
                  label def start 1 "11/3/96", modify
                  label def start 2 "21/10/98", modify
                  label def start 3 "22/7/87", modify
                  label def start 4 "24/3/93", modify
                  label def start 5 "27/12/91", modify
                  label def start 6 "4/4/90", modify
                  label values end end
                  label def end 1 "11/3/96", modify
                  label def end 2 "21/10/98", modify
                  label def end 3 "24/3/93", modify
                  label def end 4 "26/11/01", modify
                  label def end 5 "27/12/91", modify
                  label def end 6 "4/4/90", modify
                  label values pr pr
                  label def pr 1 "-0,165", modify
                  label def pr 2 "-14,9", modify
                  label def pr 3 "-4,5", modify
                  label def pr 4 "22,593", modify
                  label def pr 5 "48,458", modify
                  label values gr gr
                  label def gr 1 "-0,165", modify
                  label def gr 2 "-14,9", modify
                  label def gr 3 "-4,5", modify
                  label def gr 4 "22,593", modify
                  label def gr 5 "48,458", modify
                  
                  //  CREATE USABLE DATE VARIABLES
                  drop start
                  gen start = mdy(startmonth, startday, startyear)
                  assert missing(start) == missing(startmonth, startday, startyear)
                  format start %td
                  decode end, gen(_end)
                  drop end
                  gen end = daily(_end, "DMY", 2020)
                  assert missing(end) == missing(_end)
                  format end %td
                  drop _end startmonth startday startyear // NO LONGER NEEDED
                  gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
                  assert !missing(in_year_duration)
                  
                  //  KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
                  //  COUNTRY AND YEAR
                  by country ts (in_year_duration), sort: keep if _n == _N
                  tempfile dataset1
                  save `dataset1'
                  
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
                  "Australia" "AUS" 1990 "YR1990" 11  5
                  "Australia" "AUS" 1991 "YR1991"  8 10
                  "Australia" "AUS" 1992 "YR1992"  3  1
                  "Australia" "AUS" 1993 "YR1993"  5  2
                  "Australia" "AUS" 1994 "YR1994"  6 11
                  "Australia" "AUS" 1995 "YR1995" 10  8
                  "Australia" "AUS" 1996 "YR1996"  7  9
                  "Australia" "AUS" 1997 "YR1997"  1  7
                  "Australia" "AUS" 1998 "YR1998"  2  6
                  "Australia" "AUS" 1999 "YR1999"  4  4
                  "Australia" "AUS" 2000 "YR2000"  9  3
                  end
                  label values rate1 rate1
                  label def rate1 1 "0,224887556", modify
                  label def rate1 2 "0,86013463", modify
                  label def rate1 3 "1,012231126", modify
                  label def rate1 4 "1,483129403", modify
                  label def rate1 5 "1,753653445", modify
                  label def rate1 6 "1,969634797", modify
                  label def rate1 7 "2,615384615", modify
                  label def rate1 8 "3,17667537", modify
                  label def rate1 9 "4,457435148", modify
                  label def rate1 10 "4,6277666", modify
                  label def rate1 11 "7,333021952", modify
                  label values rate2 rate2
                  label def rate2 1 "10,72879982", modify
                  label def rate2 2 "10,87380028", modify
                  label def rate2 3 "6,282599926", modify
                  label def rate2 4 "6,872300148", modify
                  label def rate2 5 "6,926000118", modify
                  label def rate2 6 "7,675600052", modify
                  label def rate2 7 "8,362199783", modify
                  label def rate2 8 "8,46930027", modify
                  label def rate2 9 "8,506199837", modify
                  label def rate2 10 "9,579199791", modify
                  label def rate2 11 "9,718700409", modify
                  
                  //  CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
                  foreach v of varlist rate1 rate2 {
                      decode `v', gen(_`v')
                      drop `v'
                      rename _`v' `v'
                      destring `v', dpcomma replace
                  }
                  isid Country year
                  rename Country country
                  rename year ts
                  tempfile dataset2
                  save `dataset2'
                  
                  //    LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
                  use `dataset1', clear
                  merge 1:1 country ts using `dataset2'
                  This produces the same results as the original code (although the order of the variables in the data set comes out different.)

                  To be clear: where I have put the -dataex- output, you would simply -use- the corresponding real data set you have. You will still need to make those transformations of the dates, etc., and you will then need to save the results in new tempfiles `datset1' and `dataset2' as shown in this code.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Yes, the Forum FAQ, which everyone is expected to read before posting, advises people that if they are not using the current version of Stata (which, presently, is 16) they should state what version they are using. Here's a rewrite of the code that does not require frames, and uses tempfiles instead.

                    Code:
                    // FOR DATASET 1
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
                    "Australia" 1990 . . 3 6 "Australia" 1987 7 22 1 1 3 3
                    "Australia" 1990 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                    "Australia" 1991 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                    "Australia" 1991 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                    "Australia" 1992 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                    "Australia" 1993 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                    "Australia" 1993 34041 199303 4 1 "Australia" 1993 3 24 1 1 1 1
                    "Australia" 1994 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                    "Australia" 1995 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                    "Australia" 1996 35126 199603 4 1 "Australia" 1993 3 24 1 1 1 1
                    "Australia" 1996 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                    "Australia" 1997 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                    "Australia" 1998 36071 199810 1 2 "Australia" 1996 3 11 3 3 4 4
                    "Australia" 1998 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                    "Australia" 1999 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                    "Australia" 2000 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                    end
                    label values start start
                    label def start 1 "11/3/96", modify
                    label def start 2 "21/10/98", modify
                    label def start 3 "22/7/87", modify
                    label def start 4 "24/3/93", modify
                    label def start 5 "27/12/91", modify
                    label def start 6 "4/4/90", modify
                    label values end end
                    label def end 1 "11/3/96", modify
                    label def end 2 "21/10/98", modify
                    label def end 3 "24/3/93", modify
                    label def end 4 "26/11/01", modify
                    label def end 5 "27/12/91", modify
                    label def end 6 "4/4/90", modify
                    label values pr pr
                    label def pr 1 "-0,165", modify
                    label def pr 2 "-14,9", modify
                    label def pr 3 "-4,5", modify
                    label def pr 4 "22,593", modify
                    label def pr 5 "48,458", modify
                    label values gr gr
                    label def gr 1 "-0,165", modify
                    label def gr 2 "-14,9", modify
                    label def gr 3 "-4,5", modify
                    label def gr 4 "22,593", modify
                    label def gr 5 "48,458", modify
                    
                    // CREATE USABLE DATE VARIABLES
                    drop start
                    gen start = mdy(startmonth, startday, startyear)
                    assert missing(start) == missing(startmonth, startday, startyear)
                    format start %td
                    decode end, gen(_end)
                    drop end
                    gen end = daily(_end, "DMY", 2020)
                    assert missing(end) == missing(_end)
                    format end %td
                    drop _end startmonth startday startyear // NO LONGER NEEDED
                    gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
                    assert !missing(in_year_duration)
                    
                    // KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
                    // COUNTRY AND YEAR
                    by country ts (in_year_duration), sort: keep if _n == _N
                    tempfile dataset1
                    save `dataset1'
                    
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
                    "Australia" "AUS" 1990 "YR1990" 11 5
                    "Australia" "AUS" 1991 "YR1991" 8 10
                    "Australia" "AUS" 1992 "YR1992" 3 1
                    "Australia" "AUS" 1993 "YR1993" 5 2
                    "Australia" "AUS" 1994 "YR1994" 6 11
                    "Australia" "AUS" 1995 "YR1995" 10 8
                    "Australia" "AUS" 1996 "YR1996" 7 9
                    "Australia" "AUS" 1997 "YR1997" 1 7
                    "Australia" "AUS" 1998 "YR1998" 2 6
                    "Australia" "AUS" 1999 "YR1999" 4 4
                    "Australia" "AUS" 2000 "YR2000" 9 3
                    end
                    label values rate1 rate1
                    label def rate1 1 "0,224887556", modify
                    label def rate1 2 "0,86013463", modify
                    label def rate1 3 "1,012231126", modify
                    label def rate1 4 "1,483129403", modify
                    label def rate1 5 "1,753653445", modify
                    label def rate1 6 "1,969634797", modify
                    label def rate1 7 "2,615384615", modify
                    label def rate1 8 "3,17667537", modify
                    label def rate1 9 "4,457435148", modify
                    label def rate1 10 "4,6277666", modify
                    label def rate1 11 "7,333021952", modify
                    label values rate2 rate2
                    label def rate2 1 "10,72879982", modify
                    label def rate2 2 "10,87380028", modify
                    label def rate2 3 "6,282599926", modify
                    label def rate2 4 "6,872300148", modify
                    label def rate2 5 "6,926000118", modify
                    label def rate2 6 "7,675600052", modify
                    label def rate2 7 "8,362199783", modify
                    label def rate2 8 "8,46930027", modify
                    label def rate2 9 "8,506199837", modify
                    label def rate2 10 "9,579199791", modify
                    label def rate2 11 "9,718700409", modify
                    
                    // CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
                    foreach v of varlist rate1 rate2 {
                    decode `v', gen(_`v')
                    drop `v'
                    rename _`v' `v'
                    destring `v', dpcomma replace
                    }
                    isid Country year
                    rename Country country
                    rename year ts
                    tempfile dataset2
                    save `dataset2'
                    
                    // LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
                    use `dataset1', clear
                    merge 1:1 country ts using `dataset2'
                    This produces the same results as the original code (although the order of the variables in the data set comes out different.)

                    To be clear: where I have put the -dataex- output, you would simply -use- the corresponding real data set you have. You will still need to make those transformations of the dates, etc., and you will then need to save the results in new tempfiles `datset1' and `dataset2' as shown in this code.

                    Thank you very much! The code works extremely well ! I am indebted to you and cannot thank you enough!

                    Probably I was not clear enough and I apologize for that . My fault. I want to match to the longest duration for that year but the data if the are multiple observation in dataset1 in a year ( pr gr, dates etc) should not be dropped. Pr gr etc are variables that contain observation-values for all governments no matter their duration and by dropping them I loose important information.
                    They will have to remain even they do not correspond to neither observation to dataset2 . I will have to run a panel afterwards and if those are dropped that will make me loose important information. Actually it modifies seriously the concept of the research project.

                    For example in year 1990 where multiple governments occurred there are two 1990 year entries(or more in other cases). The second government had the longest duration. Data should be merged to the second government without dropping the data for the first government.

                    In sort, to make it clearer, the final database should look like database1 merging the data of database2 only to the longest duration government in database1 Probably an option to keep only those data after the merge command
                    I hope now is better understood.

                    Mario
                    Last edited by Mario Ferri; 04 Feb 2020, 13:01.

                    Comment


                    • #11
                      That's a simple change. Before reducing dataset1 to those observations that have the longest duration, we save a copy of the original data. Then we continue as before, and, at the very end, merge back the original data.

                      Code:
                      // FOR DATASET 1
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
                      "Australia" 1990     .      . 3 6 "Australia" 1987  7 22 1 1 3 3
                      "Australia" 1990     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
                      "Australia" 1991     .      . 6 5 "Australia" 1990  4  4 1 1 2 2
                      "Australia" 1991     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
                      "Australia" 1992     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
                      "Australia" 1993     .      . 5 3 "Australia" 1991 12 27 1 1 2 2
                      "Australia" 1993 34041 199303 4 1 "Australia" 1993  3 24 1 1 1 1
                      "Australia" 1994     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
                      "Australia" 1995     .      . 4 1 "Australia" 1993  3 24 1 1 1 1
                      "Australia" 1996 35126 199603 4 1 "Australia" 1993  3 24 1 1 1 1
                      "Australia" 1996     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
                      "Australia" 1997     .      . 1 2 "Australia" 1996  3 11 3 3 4 4
                      "Australia" 1998 36071 199810 1 2 "Australia" 1996  3 11 3 3 4 4
                      "Australia" 1998     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
                      "Australia" 1999     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
                      "Australia" 2000     .      . 2 4 "Australia" 1998 10 21 2 2 5 5
                      end
                      label values start start
                      label def start 1 "11/3/96", modify
                      label def start 2 "21/10/98", modify
                      label def start 3 "22/7/87", modify
                      label def start 4 "24/3/93", modify
                      label def start 5 "27/12/91", modify
                      label def start 6 "4/4/90", modify
                      label values end end
                      label def end 1 "11/3/96", modify
                      label def end 2 "21/10/98", modify
                      label def end 3 "24/3/93", modify
                      label def end 4 "26/11/01", modify
                      label def end 5 "27/12/91", modify
                      label def end 6 "4/4/90", modify
                      label values pr pr
                      label def pr 1 "-0,165", modify
                      label def pr 2 "-14,9", modify
                      label def pr 3 "-4,5", modify
                      label def pr 4 "22,593", modify
                      label def pr 5 "48,458", modify
                      label values gr gr
                      label def gr 1 "-0,165", modify
                      label def gr 2 "-14,9", modify
                      label def gr 3 "-4,5", modify
                      label def gr 4 "22,593", modify
                      label def gr 5 "48,458", modify
                      
                      //  CREATE USABLE DATE VARIABLES
                      drop start
                      gen start = mdy(startmonth, startday, startyear)
                      assert missing(start) == missing(startmonth, startday, startyear)
                      format start %td
                      decode end, gen(_end)
                      drop end
                      gen end = daily(_end, "DMY", 2020)
                      assert missing(end) == missing(_end)
                      format end %td
                      drop _end startmonth startday startyear // NO LONGER NEEDED
                      gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
                      assert !missing(in_year_duration)
                      
                      //  KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
                      //  COUNTRY AND YEAR
                      gen long obs_no = _n
                      tempfile original_dataset1
                      save `original_dataset1'
                      by country ts (in_year_duration), sort: keep if _n == _N
                      tempfile reduced_dataset1
                      save `reduced_dataset1'
                      
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
                      "Australia" "AUS" 1990 "YR1990" 11  5
                      "Australia" "AUS" 1991 "YR1991"  8 10
                      "Australia" "AUS" 1992 "YR1992"  3  1
                      "Australia" "AUS" 1993 "YR1993"  5  2
                      "Australia" "AUS" 1994 "YR1994"  6 11
                      "Australia" "AUS" 1995 "YR1995" 10  8
                      "Australia" "AUS" 1996 "YR1996"  7  9
                      "Australia" "AUS" 1997 "YR1997"  1  7
                      "Australia" "AUS" 1998 "YR1998"  2  6
                      "Australia" "AUS" 1999 "YR1999"  4  4
                      "Australia" "AUS" 2000 "YR2000"  9  3
                      end
                      label values rate1 rate1
                      label def rate1 1 "0,224887556", modify
                      label def rate1 2 "0,86013463", modify
                      label def rate1 3 "1,012231126", modify
                      label def rate1 4 "1,483129403", modify
                      label def rate1 5 "1,753653445", modify
                      label def rate1 6 "1,969634797", modify
                      label def rate1 7 "2,615384615", modify
                      label def rate1 8 "3,17667537", modify
                      label def rate1 9 "4,457435148", modify
                      label def rate1 10 "4,6277666", modify
                      label def rate1 11 "7,333021952", modify
                      label values rate2 rate2
                      label def rate2 1 "10,72879982", modify
                      label def rate2 2 "10,87380028", modify
                      label def rate2 3 "6,282599926", modify
                      label def rate2 4 "6,872300148", modify
                      label def rate2 5 "6,926000118", modify
                      label def rate2 6 "7,675600052", modify
                      label def rate2 7 "8,362199783", modify
                      label def rate2 8 "8,46930027", modify
                      label def rate2 9 "8,506199837", modify
                      label def rate2 10 "9,579199791", modify
                      label def rate2 11 "9,718700409", modify
                      
                      //  CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
                      foreach v of varlist rate1 rate2 {
                          decode `v', gen(_`v')
                          drop `v'
                          rename _`v' `v'
                          destring `v', dpcomma replace
                      }
                      isid Country year
                      rename Country country
                      rename year ts
                      tempfile dataset2
                      save `dataset2'
                      
                      //    LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
                      use `reduced_dataset1', clear
                      merge 1:1 country ts using `dataset2', nogenerate
                      merge 1:1 obs_no using `original_dataset1', assert(match using) nogenerate
                      drop obs_no
                      Changes from the previous code are shown in bold face.

                      Comment


                      • #12
                        Dear Clyde,

                        Thank you for your tremendous help writing the code! . Without your diligence, I would not have been able to solve the issue and to write the code .
                        I would like to express my deep gratitude to you for your help ! .

                        Best,

                        Mario Ferri

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post
                          That's a simple change. Before reducing dataset1 to those observations that have the longest duration, we save a copy of the original data. Then we continue as before, and, at the very end, merge back the original data.

                          Code:
                          // FOR DATASET 1
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
                          "Australia" 1990 . . 3 6 "Australia" 1987 7 22 1 1 3 3
                          "Australia" 1990 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                          "Australia" 1991 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                          "Australia" 1991 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                          "Australia" 1992 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                          "Australia" 1993 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                          "Australia" 1993 34041 199303 4 1 "Australia" 1993 3 24 1 1 1 1
                          "Australia" 1994 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                          "Australia" 1995 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                          "Australia" 1996 35126 199603 4 1 "Australia" 1993 3 24 1 1 1 1
                          "Australia" 1996 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                          "Australia" 1997 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                          "Australia" 1998 36071 199810 1 2 "Australia" 1996 3 11 3 3 4 4
                          "Australia" 1998 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                          "Australia" 1999 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                          "Australia" 2000 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                          end
                          label values start start
                          label def start 1 "11/3/96", modify
                          label def start 2 "21/10/98", modify
                          label def start 3 "22/7/87", modify
                          label def start 4 "24/3/93", modify
                          label def start 5 "27/12/91", modify
                          label def start 6 "4/4/90", modify
                          label values end end
                          label def end 1 "11/3/96", modify
                          label def end 2 "21/10/98", modify
                          label def end 3 "24/3/93", modify
                          label def end 4 "26/11/01", modify
                          label def end 5 "27/12/91", modify
                          label def end 6 "4/4/90", modify
                          label values pr pr
                          label def pr 1 "-0,165", modify
                          label def pr 2 "-14,9", modify
                          label def pr 3 "-4,5", modify
                          label def pr 4 "22,593", modify
                          label def pr 5 "48,458", modify
                          label values gr gr
                          label def gr 1 "-0,165", modify
                          label def gr 2 "-14,9", modify
                          label def gr 3 "-4,5", modify
                          label def gr 4 "22,593", modify
                          label def gr 5 "48,458", modify
                          
                          // CREATE USABLE DATE VARIABLES
                          drop start
                          gen start = mdy(startmonth, startday, startyear)
                          assert missing(start) == missing(startmonth, startday, startyear)
                          format start %td
                          decode end, gen(_end)
                          drop end
                          gen end = daily(_end, "DMY", 2020)
                          assert missing(end) == missing(_end)
                          format end %td
                          drop _end startmonth startday startyear // NO LONGER NEEDED
                          gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
                          assert !missing(in_year_duration)
                          
                          // KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
                          // COUNTRY AND YEAR
                          gen long obs_no = _n
                          tempfile original_dataset1
                          save `original_dataset1'
                          by country ts (in_year_duration), sort: keep if _n == _N
                          tempfile reduced_dataset1
                          save `reduced_dataset1'
                          
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
                          "Australia" "AUS" 1990 "YR1990" 11 5
                          "Australia" "AUS" 1991 "YR1991" 8 10
                          "Australia" "AUS" 1992 "YR1992" 3 1
                          "Australia" "AUS" 1993 "YR1993" 5 2
                          "Australia" "AUS" 1994 "YR1994" 6 11
                          "Australia" "AUS" 1995 "YR1995" 10 8
                          "Australia" "AUS" 1996 "YR1996" 7 9
                          "Australia" "AUS" 1997 "YR1997" 1 7
                          "Australia" "AUS" 1998 "YR1998" 2 6
                          "Australia" "AUS" 1999 "YR1999" 4 4
                          "Australia" "AUS" 2000 "YR2000" 9 3
                          end
                          label values rate1 rate1
                          label def rate1 1 "0,224887556", modify
                          label def rate1 2 "0,86013463", modify
                          label def rate1 3 "1,012231126", modify
                          label def rate1 4 "1,483129403", modify
                          label def rate1 5 "1,753653445", modify
                          label def rate1 6 "1,969634797", modify
                          label def rate1 7 "2,615384615", modify
                          label def rate1 8 "3,17667537", modify
                          label def rate1 9 "4,457435148", modify
                          label def rate1 10 "4,6277666", modify
                          label def rate1 11 "7,333021952", modify
                          label values rate2 rate2
                          label def rate2 1 "10,72879982", modify
                          label def rate2 2 "10,87380028", modify
                          label def rate2 3 "6,282599926", modify
                          label def rate2 4 "6,872300148", modify
                          label def rate2 5 "6,926000118", modify
                          label def rate2 6 "7,675600052", modify
                          label def rate2 7 "8,362199783", modify
                          label def rate2 8 "8,46930027", modify
                          label def rate2 9 "8,506199837", modify
                          label def rate2 10 "9,579199791", modify
                          label def rate2 11 "9,718700409", modify
                          
                          // CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
                          foreach v of varlist rate1 rate2 {
                          decode `v', gen(_`v')
                          drop `v'
                          rename _`v' `v'
                          destring `v', dpcomma replace
                          }
                          isid Country year
                          rename Country country
                          rename year ts
                          tempfile dataset2
                          save `dataset2'
                          
                          // LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
                          use `reduced_dataset1', clear
                          merge 1:1 country ts using `dataset2', nogenerate
                          merge 1:1 obs_no using `original_dataset1', assert(match using) nogenerate
                          drop obs_no
                          Changes from the previous code are shown in bold face.
                          Dear Clyde,

                          Thank you for your tremendous help writing the code! . Without your diligence, I would not have been able to solve the issue and to write the code .
                          I would like to express my deep gratitude to you for your help ! .

                          Best,

                          Mario Ferri

                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            That's a simple change. Before reducing dataset1 to those observations that have the longest duration, we save a copy of the original data. Then we continue as before, and, at the very end, merge back the original data.

                            Code:
                            // FOR DATASET 1
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str9 ccode int ts long(edate date start end) str9 country int startyear byte(startmonth startday tap1 tap2) long(pr gr)
                            "Australia" 1990 . . 3 6 "Australia" 1987 7 22 1 1 3 3
                            "Australia" 1990 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                            "Australia" 1991 . . 6 5 "Australia" 1990 4 4 1 1 2 2
                            "Australia" 1991 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                            "Australia" 1992 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                            "Australia" 1993 . . 5 3 "Australia" 1991 12 27 1 1 2 2
                            "Australia" 1993 34041 199303 4 1 "Australia" 1993 3 24 1 1 1 1
                            "Australia" 1994 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                            "Australia" 1995 . . 4 1 "Australia" 1993 3 24 1 1 1 1
                            "Australia" 1996 35126 199603 4 1 "Australia" 1993 3 24 1 1 1 1
                            "Australia" 1996 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                            "Australia" 1997 . . 1 2 "Australia" 1996 3 11 3 3 4 4
                            "Australia" 1998 36071 199810 1 2 "Australia" 1996 3 11 3 3 4 4
                            "Australia" 1998 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                            "Australia" 1999 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                            "Australia" 2000 . . 2 4 "Australia" 1998 10 21 2 2 5 5
                            end
                            label values start start
                            label def start 1 "11/3/96", modify
                            label def start 2 "21/10/98", modify
                            label def start 3 "22/7/87", modify
                            label def start 4 "24/3/93", modify
                            label def start 5 "27/12/91", modify
                            label def start 6 "4/4/90", modify
                            label values end end
                            label def end 1 "11/3/96", modify
                            label def end 2 "21/10/98", modify
                            label def end 3 "24/3/93", modify
                            label def end 4 "26/11/01", modify
                            label def end 5 "27/12/91", modify
                            label def end 6 "4/4/90", modify
                            label values pr pr
                            label def pr 1 "-0,165", modify
                            label def pr 2 "-14,9", modify
                            label def pr 3 "-4,5", modify
                            label def pr 4 "22,593", modify
                            label def pr 5 "48,458", modify
                            label values gr gr
                            label def gr 1 "-0,165", modify
                            label def gr 2 "-14,9", modify
                            label def gr 3 "-4,5", modify
                            label def gr 4 "22,593", modify
                            label def gr 5 "48,458", modify
                            
                            // CREATE USABLE DATE VARIABLES
                            drop start
                            gen start = mdy(startmonth, startday, startyear)
                            assert missing(start) == missing(startmonth, startday, startyear)
                            format start %td
                            decode end, gen(_end)
                            drop end
                            gen end = daily(_end, "DMY", 2020)
                            assert missing(end) == missing(_end)
                            format end %td
                            drop _end startmonth startday startyear // NO LONGER NEEDED
                            gen in_year_duration = min(end, mdy(12, 31, ts)) - max(start, mdy(1, 1, ts))
                            assert !missing(in_year_duration)
                            
                            // KEEP THE OBSERVATION WITH LONGEST IN YEAR DURATION FOR EACH
                            // COUNTRY AND YEAR
                            gen long obs_no = _n
                            tempfile original_dataset1
                            save `original_dataset1'
                            by country ts (in_year_duration), sort: keep if _n == _N
                            tempfile reduced_dataset1
                            save `reduced_dataset1'
                            
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str9 Country str3 Symbol int year str6 YearSymbol long(rate1 rate2)
                            "Australia" "AUS" 1990 "YR1990" 11 5
                            "Australia" "AUS" 1991 "YR1991" 8 10
                            "Australia" "AUS" 1992 "YR1992" 3 1
                            "Australia" "AUS" 1993 "YR1993" 5 2
                            "Australia" "AUS" 1994 "YR1994" 6 11
                            "Australia" "AUS" 1995 "YR1995" 10 8
                            "Australia" "AUS" 1996 "YR1996" 7 9
                            "Australia" "AUS" 1997 "YR1997" 1 7
                            "Australia" "AUS" 1998 "YR1998" 2 6
                            "Australia" "AUS" 1999 "YR1999" 4 4
                            "Australia" "AUS" 2000 "YR2000" 9 3
                            end
                            label values rate1 rate1
                            label def rate1 1 "0,224887556", modify
                            label def rate1 2 "0,86013463", modify
                            label def rate1 3 "1,012231126", modify
                            label def rate1 4 "1,483129403", modify
                            label def rate1 5 "1,753653445", modify
                            label def rate1 6 "1,969634797", modify
                            label def rate1 7 "2,615384615", modify
                            label def rate1 8 "3,17667537", modify
                            label def rate1 9 "4,457435148", modify
                            label def rate1 10 "4,6277666", modify
                            label def rate1 11 "7,333021952", modify
                            label values rate2 rate2
                            label def rate2 1 "10,72879982", modify
                            label def rate2 2 "10,87380028", modify
                            label def rate2 3 "6,282599926", modify
                            label def rate2 4 "6,872300148", modify
                            label def rate2 5 "6,926000118", modify
                            label def rate2 6 "7,675600052", modify
                            label def rate2 7 "8,362199783", modify
                            label def rate2 8 "8,46930027", modify
                            label def rate2 9 "8,506199837", modify
                            label def rate2 10 "9,579199791", modify
                            label def rate2 11 "9,718700409", modify
                            
                            // CONVERT THE RATE VARIABLES TO ACTUAL NUMERIC RATES, NOT VALUE LABELED INTEGERS
                            foreach v of varlist rate1 rate2 {
                            decode `v', gen(_`v')
                            drop `v'
                            rename _`v' `v'
                            destring `v', dpcomma replace
                            }
                            isid Country year
                            rename Country country
                            rename year ts
                            tempfile dataset2
                            save `dataset2'
                            
                            // LINK THE TWO DATA SETS AND BRING THE DATASET 1 VARIABLES OVER
                            use `reduced_dataset1', clear
                            merge 1:1 country ts using `dataset2', nogenerate
                            merge 1:1 obs_no using `original_dataset1', assert(match using) nogenerate
                            drop obs_no
                            Changes from the previous code are shown in bold face.
                            Dear Clyde,

                            I am trying to implement the code on my main final datasets (after some changes imported to the first ones) and while the first time the code worked, when I am now trying to run it in the first part of the code when I reach the assert command(in bold) :
                            Code:
                            //  CREATE USABLE DATE VARIABLES
                            drop start
                            gen start = mdy(startmonth, startday, startyear)
                            assert missing(start) == missing(startmonth, startday, startyear) 
                            It displays :
                            Code:
                            2 contradictions in 1,735 observations
                            assertion is false
                            r(9);
                            I would appreciate any help you can provide

                            thank you

                            Mario Ferri

                            Comment


                            • #15
                              Well, what that command does is verify that the combination of startmonth, startday, and startyear forms a valid date, or else one of them is missing. So the implication is that somewhere in your new data you have some combination of startmonth, startday and startyear that are not valid as a date, e.g. perhaps an April 31 or a month that isn't between 1 and 12.

                              It's easy enough to chase down the problem:
                              Code:
                              browse if missing(start) != missing(startmonth, startday, startyear)
                              will show you the two offending observations. Look at startmonth, startday and startyear there and you'll be able to see the problem. Then you have to fix it.

                              Comment

                              Working...
                              X