Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Date-Month-Year (string) to year

    Dear Stata Members
    I have an issue which I know is the most answered questions in this form, I tried using some past posts but to no avail. My issue is reproduced. Incorp is the variable from which I need to extract year


    Code:
    describe incorp
    
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    ------------------------------------------------------------------------------------------------------------------
    incorp          str9    %9s
    .

    Code:
    list incorp in 1/40
    
         +-----------+
         |    incorp |
         |-----------|
      1. |           |
      2. |           |
      3. |           |
      4. |           |
      5. |           |
         |-----------|
      6. |           |
      7. |           |
      8. |           |
      9. |           |
     10. |           |
         |-----------|
     11. |           |
     12. |           |
     13. |           |
     14. |           |
     15. |           |
         |-----------|
     16. |           |
     17. |           |
     18. |           |
     19. |           |
     20. |           |
         |-----------|
     21. |           |
     22. |           |
     23. | 05-May-86 |
     24. | 05-May-86 |
     25. | 05-May-86 |
         |-----------|
     26. | 05-May-86 |
     27. | 05-May-86 |
     28. | 05-May-86 |
     29. | 05-May-86 |
     30. | 05-May-86 |
         |-----------|
     31. | 05-May-86 |
     32. | 05-May-86 |
     33. | 05-May-86 |
     34. | 05-May-86 |
     35. | 05-May-86 |
         |-----------|
     36. | 05-May-86 |
     37. | 05-May-86 |
     38. | 05-May-86 |
     39. | 05-May-86 |
     40. | 05-May-86 |
         +-----------+

    I tried using
    Code:
    generate numyear = date(incorp, "MDY")
    (408,405 missing values generated)
    but
    Code:
    . mdesc numyear
    
        Variable    |     Missing          Total     Percent Missing
    ----------------+-----------------------------------------------
            numyear |     408,405        408,405         100.00
    ----------------+-----------------------------------------------
    My issue is to extract 1986 if the year is 05-May-86. That is the calendar year. How to do that

  • #2
    Neelakanda:
    provided that becoming proficient with date-related issues is really challenging even after years of Stata practising (this is my experience, at least), you may want to try:
    Code:
    set obs 1
    . g time_string="05-May-86"
    . generate double timestamp = clock( time_string, "DMY", 1986)
    
    . format timestamp %tcDDmonYY
    
    . list
    
         +----------------------+
         | time_st~g   timest~p |
         |----------------------|
      1. | 05-May-86    05may86 |
         +----------------------+
    
    .
    Studying (and rehearsing when you think you get it) all the date-related entries in Stata .pdf manual is warmly recommended.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      You're not new to Statalist and should be aware of advice at https://www.statalist.org/forums/help#before and https://www.statalist.org/forums/help#stata

      Please look again at the help for daily() or date().

      Code:
      help daily()


      With your kind of daily date, the century is not explicit.

      This should work: year(daily(incorp, "DMY", 2021))

      Code:
      . clear
      
      . set obs 1
      Number of observations (_N) was 0, now 1.
      
      . gen incorp = "05-may-86"
      
      . gen year = year(daily(incorp, "DMY", 2021))
      
      . l
      
           +------------------+
           |    incorp   year |
           |------------------|
        1. | 05-may-86   1986 |
           +------------------+
      This should also work

      Code:
      . gen YEAR = real(substr(incorp, -2, 2))
      
      . replace YEAR = cond(YEAR <= 21, 2000 + YEAR, 1900 + YEAR)
      
      .
      If the code seems obscure, look at the documentation for each function used:


      Code:
      help substr()
      
      help real()
      
      help cond()

      Comment


      • #4
        Dear Carlo Lazzaro, thanks for the timely help, and I missed even my lunch to figure out what is really wrong. Regarding your code, how can I apply it to my general context? Rubbing Salt to Wound, I realized that
        I have both formats, for instance, dateofincorporation is my variable
        Code:
        dateofincorporation
        5/25/1989
        03-11-1983
        I am attaching my sample data

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str14 ric int year str10 dateofincorporation
        "000002.SZ" 1999 "5/30/1984"
        "000002.SZ" 2000 "5/30/1984"
        "000002.SZ" 2001 "5/30/1984"
        "000002.SZ" 2002 "5/30/1984"
        "000002.SZ" 2003 "5/30/1984"
        "000002.SZ" 2004 "5/30/1984"
        "000002.SZ" 2005 "5/30/1984"
        "000002.SZ" 2006 "5/30/1984"
        "000002.SZ" 2007 "5/30/1984"
        "000002.SZ" 2008 "5/30/1984"
        "000002.SZ" 2009 "5/30/1984"
        "000002.SZ" 2010 "5/30/1984"
        "000002.SZ" 2011 "5/30/1984"
        "000002.SZ" 2012 "5/30/1984"
        "000002.SZ" 2013 "5/30/1984"
        "000002.SZ" 2014 "5/30/1984"
        "000002.SZ" 2015 "5/30/1984"
        "000002.SZ" 2016 "5/30/1984"
        "000002.SZ" 2017 "5/30/1984"
        "000002.SZ" 2018 "5/30/1984"
        "000002.SZ" 2019 "5/30/1984"
        "000002.SZ" 2020 "5/30/1984"
        "000004.SZ" 2000 "05-05-1986"
        "000004.SZ" 2001 "05-05-1986"
        "000004.SZ" 2002 "05-05-1986"
        "000004.SZ" 2003 "05-05-1986"
        "000004.SZ" 2004 "05-05-1986"
        "000004.SZ" 2005 "05-05-1986"
        "000004.SZ" 2006 "05-05-1986"
        "000004.SZ" 2007 "05-05-1986"
        "000004.SZ" 2008 "05-05-1986"
        "000004.SZ" 2009 "05-05-1986"
        "000004.SZ" 2010 "05-05-1986"
        "000004.SZ" 2011 "05-05-1986"
        "000004.SZ" 2012 "05-05-1986"
        "000004.SZ" 2013 "05-05-1986"
        "000004.SZ" 2014 "05-05-1986"
        "000004.SZ" 2015 "05-05-1986"
        "000004.SZ" 2016 "05-05-1986"
        "000004.SZ" 2017 "05-05-1986"
        "000004.SZ" 2018 "05-05-1986"
        "000004.SZ" 2019 "05-05-1986"
        "000004.SZ" 2020 "05-05-1986"
        "000005.SZ" 1999 "05-08-1989"
        "000005.SZ" 2000 "05-08-1989"
        "000005.SZ" 2001 "05-08-1989"
        "000005.SZ" 2002 "05-08-1989"
        "000005.SZ" 2003 "05-08-1989"
        "000005.SZ" 2004 "05-08-1989"
        "000005.SZ" 2005 "05-08-1989"
        "000005.SZ" 2006 "05-08-1989"
        "000005.SZ" 2007 "05-08-1989"
        "000005.SZ" 2008 "05-08-1989"
        "000005.SZ" 2009 "05-08-1989"
        "000005.SZ" 2010 "05-08-1989"
        "000005.SZ" 2011 "05-08-1989"
        "000005.SZ" 2012 "05-08-1989"
        "000005.SZ" 2013 "05-08-1989"
        "000005.SZ" 2014 "05-08-1989"
        "000005.SZ" 2015 "05-08-1989"
        "000005.SZ" 2016 "05-08-1989"
        "000005.SZ" 2017 "05-08-1989"
        "000005.SZ" 2018 "05-08-1989"
        "000005.SZ" 2019 "05-08-1989"
        "000005.SZ" 2020 "05-08-1989"
        "000006.SZ" 1999 "5/25/1989"
        "000006.SZ" 2000 "5/25/1989"
        "000006.SZ" 2001 "5/25/1989"
        "000006.SZ" 2002 "5/25/1989"
        "000006.SZ" 2003 "5/25/1989"
        "000006.SZ" 2004 "5/25/1989"
        "000006.SZ" 2005 "5/25/1989"
        "000006.SZ" 2006 "5/25/1989"
        "000006.SZ" 2007 "5/25/1989"
        "000006.SZ" 2008 "5/25/1989"
        "000006.SZ" 2009 "5/25/1989"
        "000006.SZ" 2010 "5/25/1989"
        "000006.SZ" 2011 "5/25/1989"
        "000006.SZ" 2012 "5/25/1989"
        "000006.SZ" 2013 "5/25/1989"
        "000006.SZ" 2014 "5/25/1989"
        "000006.SZ" 2015 "5/25/1989"
        "000006.SZ" 2016 "5/25/1989"
        "000006.SZ" 2017 "5/25/1989"
        "000006.SZ" 2018 "5/25/1989"
        "000006.SZ" 2019 "5/25/1989"
        "000006.SZ" 2020 "5/25/1989"
        "000007.SZ" 1999 "03-11-1983"
        "000007.SZ" 2000 "03-11-1983"
        "000007.SZ" 2001 "03-11-1983"
        "000007.SZ" 2002 "03-11-1983"
        "000007.SZ" 2003 "03-11-1983"
        "000007.SZ" 2004 "03-11-1983"
        "000007.SZ" 2005 "03-11-1983"
        "000007.SZ" 2006 "03-11-1983"
        "000007.SZ" 2007 "03-11-1983"
        "000007.SZ" 2008 "03-11-1983"
        "000007.SZ" 2009 "03-11-1983"
        "000007.SZ" 2010 "03-11-1983"
        "000007.SZ" 2011 "03-11-1983"
        "000007.SZ" 2012 "03-11-1983"
        "000007.SZ" 2013 "03-11-1983"
        "000007.SZ" 2014 "03-11-1983"
        "000007.SZ" 2015 "03-11-1983"
        "000007.SZ" 2016 "03-11-1983"
        "000007.SZ" 2017 "03-11-1983"
        "000007.SZ" 2018 "03-11-1983"
        "000007.SZ" 2019 "03-11-1983"
        "000007.SZ" 2020 "03-11-1983"
        "000008.SZ" 2000 "10-11-1989"
        "000008.SZ" 2001 "10-11-1989"
        "000008.SZ" 2002 "10-11-1989"
        "000008.SZ" 2003 "10-11-1989"
        "000008.SZ" 2004 "10-11-1989"
        "000008.SZ" 2005 "10-11-1989"
        "000008.SZ" 2006 "10-11-1989"
        "000008.SZ" 2007 "10-11-1989"
        "000008.SZ" 2008 "10-11-1989"
        "000008.SZ" 2009 "10-11-1989"
        "000008.SZ" 2010 "10-11-1989"
        "000008.SZ" 2011 "10-11-1989"
        "000008.SZ" 2012 "10-11-1989"
        "000008.SZ" 2013 "10-11-1989"
        "000008.SZ" 2014 "10-11-1989"
        "000008.SZ" 2015 "10-11-1989"
        "000008.SZ" 2016 "10-11-1989"
        "000008.SZ" 2017 "10-11-1989"
        "000008.SZ" 2018 "10-11-1989"
        "000008.SZ" 2019 "10-11-1989"
        "000008.SZ" 2020 "10-11-1989"
        "000009.SZ" 1999 "10-08-1990"
        "000009.SZ" 2000 "10-08-1990"
        "000009.SZ" 2001 "10-08-1990"
        "000009.SZ" 2002 "10-08-1990"
        "000009.SZ" 2003 "10-08-1990"
        "000009.SZ" 2004 "10-08-1990"
        "000009.SZ" 2005 "10-08-1990"
        "000009.SZ" 2006 "10-08-1990"
        "000009.SZ" 2007 "10-08-1990"
        "000009.SZ" 2008 "10-08-1990"
        "000009.SZ" 2009 "10-08-1990"
        "000009.SZ" 2010 "10-08-1990"
        "000009.SZ" 2011 "10-08-1990"
        "000009.SZ" 2012 "10-08-1990"
        "000009.SZ" 2013 "10-08-1990"
        "000009.SZ" 2014 "10-08-1990"
        "000009.SZ" 2015 "10-08-1990"
        "000009.SZ" 2016 "10-08-1990"
        "000009.SZ" 2017 "10-08-1990"
        "000009.SZ" 2018 "10-08-1990"
        "000009.SZ" 2019 "10-08-1990"
        "000009.SZ" 2020 "10-08-1990"
        "000010.SZ" 1999 "01-09-1989"
        "000010.SZ" 2000 "01-09-1989"
        "000010.SZ" 2001 "01-09-1989"
        "000010.SZ" 2002 "01-09-1989"
        "000010.SZ" 2003 "01-09-1989"
        "000010.SZ" 2004 "01-09-1989"
        "000010.SZ" 2005 "01-09-1989"
        "000010.SZ" 2006 "01-09-1989"
        "000010.SZ" 2007 "01-09-1989"
        "000010.SZ" 2008 "01-09-1989"
        "000010.SZ" 2009 "01-09-1989"
        "000010.SZ" 2010 "01-09-1989"
        "000010.SZ" 2011 "01-09-1989"
        "000010.SZ" 2012 "01-09-1989"
        "000010.SZ" 2013 "01-09-1989"
        "000010.SZ" 2014 "01-09-1989"
        "000010.SZ" 2015 "01-09-1989"
        "000010.SZ" 2016 "01-09-1989"
        "000010.SZ" 2017 "01-09-1989"
        "000010.SZ" 2018 "01-09-1989"
        "000010.SZ" 2019 "01-09-1989"
        "000010.SZ" 2020 "01-09-1989"
        "000011.SZ" 1999 "1/17/1983"
        "000011.SZ" 2000 "1/17/1983"
        "000011.SZ" 2001 "1/17/1983"
        "000011.SZ" 2002 "1/17/1983"
        "000011.SZ" 2003 "1/17/1983"
        "000011.SZ" 2004 "1/17/1983"
        "000011.SZ" 2005 "1/17/1983"
        "000011.SZ" 2006 "1/17/1983"
        "000011.SZ" 2007 "1/17/1983"
        "000011.SZ" 2008 "1/17/1983"
        "000011.SZ" 2009 "1/17/1983"
        "000011.SZ" 2010 "1/17/1983"
        "000011.SZ" 2011 "1/17/1983"
        "000011.SZ" 2012 "1/17/1983"
        "000011.SZ" 2013 "1/17/1983"
        "000011.SZ" 2014 "1/17/1983"
        "000011.SZ" 2015 "1/17/1983"
        "000011.SZ" 2016 "1/17/1983"
        "000011.SZ" 2017 "1/17/1983"
        "000011.SZ" 2018 "1/17/1983"
        "000011.SZ" 2019 "1/17/1983"
        "000011.SZ" 2020 "1/17/1983"
        "000012.SZ" 1999 "09-10-1984"
        "000012.SZ" 2000 "09-10-1984"
        "000012.SZ" 2001 "09-10-1984"
        "000012.SZ" 2002 "09-10-1984"
        "000012.SZ" 2003 "09-10-1984"
        "000012.SZ" 2004 "09-10-1984"
        "000012.SZ" 2005 "09-10-1984"
        "000012.SZ" 2006 "09-10-1984"
        "000012.SZ" 2007 "09-10-1984"
        "000012.SZ" 2008 "09-10-1984"
        "000012.SZ" 2009 "09-10-1984"
        "000012.SZ" 2010 "09-10-1984"
        "000012.SZ" 2011 "09-10-1984"
        "000012.SZ" 2012 "09-10-1984"
        "000012.SZ" 2013 "09-10-1984"
        "000012.SZ" 2014 "09-10-1984"
        "000012.SZ" 2015 "09-10-1984"
        "000012.SZ" 2016 "09-10-1984"
        "000012.SZ" 2017 "09-10-1984"
        "000012.SZ" 2018 "09-10-1984"
        "000012.SZ" 2019 "09-10-1984"
        "000012.SZ" 2020 "09-10-1984"
        "000014.SZ" 1999 "7/27/1987"
        "000014.SZ" 2000 "7/27/1987"
        "000014.SZ" 2001 "7/27/1987"
        "000014.SZ" 2002 "7/27/1987"
        "000014.SZ" 2003 "7/27/1987"
        "000014.SZ" 2004 "7/27/1987"
        "000014.SZ" 2005 "7/27/1987"
        "000014.SZ" 2006 "7/27/1987"
        "000014.SZ" 2007 "7/27/1987"
        "000014.SZ" 2008 "7/27/1987"
        "000014.SZ" 2009 "7/27/1987"
        "000014.SZ" 2010 "7/27/1987"
        "000014.SZ" 2011 "7/27/1987"
        "000014.SZ" 2012 "7/27/1987"
        "000014.SZ" 2013 "7/27/1987"
        "000014.SZ" 2014 "7/27/1987"
        "000014.SZ" 2015 "7/27/1987"
        "000014.SZ" 2016 "7/27/1987"
        "000014.SZ" 2017 "7/27/1987"
        "000014.SZ" 2018 "7/27/1987"
        "000014.SZ" 2019 "7/27/1987"
        "000014.SZ" 2020 "7/27/1987"
        "000016.SZ" 1999 "10-01-1980"
        "000016.SZ" 2000 "10-01-1980"
        "000016.SZ" 2001 "10-01-1980"
        "000016.SZ" 2002 "10-01-1980"
        "000016.SZ" 2003 "10-01-1980"
        "000016.SZ" 2004 "10-01-1980"
        "000016.SZ" 2005 "10-01-1980"
        "000016.SZ" 2006 "10-01-1980"
        "000016.SZ" 2007 "10-01-1980"
        "000016.SZ" 2008 "10-01-1980"
        end
        Now I ran the following commands which I think is the most rudiment way of coding, Pardon me
        Code:
        gen date2 = date(dateofincorporation, "DMY") // to convert those dateofincorporation having Date month year format
        gen date3 = date(dateofincorporation, "MDY") // to convert those dateofincorporation having month date year format
        format date3 %td
        format date2 %td
        /*Now the problem is with the dateofincorporation like 05-08-1989
        
        dateofincorporation date2 date3
        05-08-1989 05aug1989 08may1989
        */
        *since date2 is correct, I had to replace it suitably. Hence I created, a new var
        gen date4=.
        replace date4=date2 if date4==.
        replace date4=date3 if date4==.
        
        format date4 %td
        gen year2=year(date4)
        
        gen age=year-year2 // My needed var
        Am I correct in my codes

        Source:https://stats.oarc.ucla.edu/stata/fa...can-recognize/
        Last edited by Neelakanda Krishna; 28 Dec 2021, 07:17.

        Comment


        • #5
          #5 overlooks #4.

          An answer is: try one solution and then another if that does not work.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str64 incorp
          "05-may-86" 
          "5/30/1984" 
          "05-05-1986"
          end
          
          . gen year = year(daily(incorp), "DMY", 2021)
          (1 missing value generated)
          
          . l
          
               +-------------------+
               |     incorp   year |
               |-------------------|
            1. |  05-may-86   1986 |
            2. |  5/30/1984      . |
            3. | 05-05-1986   1986 |
               +-------------------+
          
          . replace year = year(daily(incorp), "MDY", 2021) if missing(year)
          (1 real change made)
          
          . l
          
               +-------------------+
               |     incorp   year |
               |-------------------|
            1. |  05-may-86   1986 |
            2. |  5/30/1984   1984 |
            3. | 05-05-1986   1986 |
               +-------------------+
          
          .
          But there is no free lunch here if your data have inconsistent formats. It's not obvious whether e.g. "1/11/2020" means 11 January or 1 November if you have a mix of DMY and MDY input.

          Comment


          • #6
            Neelakanda:
            a minor economist (https://en.wikipedia.org/wiki/Milton_Friedman) to whom Nick is probably referring to, said once that "There's no such thing as a free lunch".
            Conversely, most of us know that there is such thing as a skipped lunch (and dinner, when things go really bad) during endless Stata sessions!
            Sticking with your query, I think that Nick (weird indeed ) provided all the answers you need to move forward.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Dear Carlo Lazzaro and @ Nick Cox. Thanks for the helping hand. Sorry dear Carlo, I was not grumbling but was indeed thankful when you gently step in for help. I was trying to fix the problem myself so that I won't trouble busy Stata forum. I am sorry I sounded like complaining. I really enjoy learning these things myself but need to get a clear answer from people like you who can say if I went wrong. I hope I haven't hurt both Nick and Carlo by my words

              Comment


              • #8
                Neelakanda:
                no need at all to apologize about anything.
                Everything I wrote was just for fun in this otherwise gloomy era.
                Enjoy staying with the Stata forum!
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment

                Working...
                X