Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify internal numeric logic between string and numeric variables?

    Hi Statalisters,

    I have the following dataset with three variables, srcdate, srcyear, and fiscalyear, in which srcyear is a numeric variable that extracts year-only data from string variable srcdate and fiscalyear is an independent variable. See the demo.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 srcdate float srcyear int fiscalyear
    "31DEC2014" 2014 2014
    "31DEC2015" 2015 2015
    "31DEC2016" 2016 2016
    "31DEC2017" 2017 2017
    "31DEC2018" 2018 2018
    "31DEC2015" 2015 2015
    "31DEC2016" 2016 2016
    "31DEC2012" 2012 2012
    "31DEC2013" 2013 2013
    "31DEC2016" 2016 2016
    "31DEC2017" 2017 2017
    "31DEC2018" 2018 2018
    "31DEC2017" 2017 2017
    "31DEC2018" 2018 2018
    "31MAR2010" 2010 2009
    "31MAR2011" 2011 2010
    "31MAR2012" 2012 2011
    "31MAR2013" 2013 2012
    "31MAR2014" 2014 2013
    "31MAR2015" 2015 2014
    "31MAR2016" 2016 2015
    "31MAR2017" 2017 2016
    "31MAR2018" 2018 2017
    "30JUN2010" 2010 2010
    "30JUN2011" 2011 2011
    "30JUN2012" 2012 2012
    "30JUN2013" 2013 2013
    end

    However, not all srcyear are equal to fiscalyear, after comparing them, there are only two scenarios, srcyear = fiscalyear and srcyear > fiscalyear. Then I want to dig out the internal logic that what makes the difference. By glancing, I find that if srcdate is between Jan to May, then srcyear = fiscalyear + 1, meaning there exists a lag between these two variables; otherwise, they are equal. But I do not know how to write codes to demo this logic to realize:
    1. Technically find the logic that if srcdate are between Jan and May, then there is one year lag between srcyear and fiscalyear.
    2. Use this logic to identify all srcdate and transfer it to correponding fiscalyear (imagine that we have srcdate only here and want to create corresponding fiscalyear)

    Big thanks in advanced.

  • #2
    So first, you need to create that string variable that reads like a date to human eyes into something that Stata can work with. Once we have a real Stata internal format date, then we can extract the month and condition calculations on the value of the month.

    Code:
    gen sif_srcdate = daily(srcdate, "DMY")
    assert missing(sif_srcdate) == missing(srcdate)
    format sif_srcdate %td
    drop srcdate
    rename sif_srcdate srcdate
    gen moy = month(srcdate) // MONTH OF YEAR, 1 TO 12
    
    
    //  VERIFY THAT IF SRCDATE IS BETWEEN JAN AND MAY, THEN SRCYEAR = FISCALYEAR + 1
    //  AND OTHERWISE SRCYEAR = FISCAL YEAR
    assert srcyear == fiscalyear + inrange(moy, 1, 5) if !missing(fiscalyear, srcyear)
    
    //  USE THIS LOGIC TO IMPUTE FISCAL YEAR IF FISCAL YEAR IS NOT GIVEN
    replace fiscalyear = srcyear - inrange(moy, 1, 5) if missing(fiscalyear)
    Last edited by Clyde Schechter; 04 Mar 2020, 22:23.

    Comment


    • #3
      [QUOTE=Clyde Schechter;n1539789]So first, you need to create that string variable that reads like a date to human eyes into something that Stata can work with. Once we have a real Stata internal format date, then we can extract the month and condition calculations on the value of the month.

      Thanks Clyde. That works.

      Comment

      Working...
      X