Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [HELP] Creating new variables by splitting another variable

    Hello Statalist--

    I have an appended dataset that has a variable named 'filename' which tracks the original file name for each observation.

    The filenames are structured in the following manner: 'medicaid_co_0219.csv', where the first two digits of '0219' (i.e. 02) is the month and the final two digits (i.e. 19) is the corresponding year.

    I would like to create two variables: one for month and one for year but have no idea how to go about it. Ideally the year should be reflected in four digits (so instead of showing up as 19, it would show up as 2019), but if this is not possible no worries because that is an easy fix. My priority is creating those two new variables.

    Any help is greatly appreciated.

  • #2
    Code:
    clear
    input str30 filename
    "medicaid_co_0219.csv"
    "medicaid_co_0218.csv"
    "medicaid_co_1117.csv"
    end
    
    gen month = real(substr(filename, -8, 2))
    gen year = real("20" + substr(filename, -6, 2))
    gen ym = ym(year, month)
    format ym %tm
    list, noobs
      +-----------------------------------------------+
      |             filename   month   year        ym |
      |-----------------------------------------------|
      | medicaid_co_0219.csv       2   2019    2019m2 |
      | medicaid_co_0218.csv       2   2018    2018m2 |
      | medicaid_co_1117.csv      11   2017   2017m11 |
      +-----------------------------------------------+
    This rests on the assumption that your dates are always four digits and all filenames end with '.csv'. If in fact it is more complicated than this, a representative data example would be helpful.

    Comment


    • #3
      Thank you for your help Wouter. Worked perfectly!!

      Comment

      Working...
      X