Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Encode not converting string variable to a categorical

    Hi,

    I have a month variable- in the form of a string. However, when I encode it, I do not get a categorical variable. I have tried the following the things:

    1.

    Code:
    tab month_f if missing(real( month_f ))
    
        MONTH_F |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       Apr 2020 |    214,603       25.07       25.07
         Apr-19 |    213,367       24.92       49.99
       Aug 2019 |    213,885       24.98       74.97
       Dec 2019 |    214,310       25.03      100.00
    ------------+-----------------------------------
          Total |    856,165      100.00
    2.

    Code:
    . gen month_f1 = month_f
    
    . replace month_f1 = "" if trim( month_f1 )=="-"
    (0 real changes made)
    There seems to be a special character "-" in a month that I cannot remove. The normal: destring month_f, gen(_month_) ignore("-") has not worked for me.

    Example of the variable:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 month_f
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    "Apr-19"  
    "Aug 2019"
    "Dec 2019"
    "Apr 2020"
    end

  • #2
    Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

    All Stata manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

    Here is how to convert your example data to a Stata monthly variable.
    Code:
    . generate mon = monthly(month_f,"M20Y")
    
    . format %tm mon
    
    . list, clean
    
            month_f       mon  
      1.     Apr-19    2019m4  
      2.   Aug 2019    2019m8  
      3.   Dec 2019   2019m12  
      4.   Apr 2020    2020m4  
      5.     Apr-19    2019m4  
      6.   Aug 2019    2019m8  
      7.   Dec 2019   2019m12  
      8.   Apr 2020    2020m4  
      9.     Apr-19    2019m4  
     10.   Aug 2019    2019m8  
     11.   Dec 2019   2019m12  
     12.   Apr 2020    2020m4  
     13.     Apr-19    2019m4  
     14.   Aug 2019    2019m8  
     15.   Dec 2019   2019m12  
     16.   Apr 2020    2020m4  
     17.     Apr-19    2019m4  
     18.   Aug 2019    2019m8  
     19.   Dec 2019   2019m12  
     20.   Apr 2020    2020m4  
     21.     Apr-19    2019m4  
     22.   Aug 2019    2019m8  
     23.   Dec 2019   2019m12  
     24.   Apr 2020    2020m4  
     25.     Apr-19    2019m4  
     26.   Aug 2019    2019m8  
     27.   Dec 2019   2019m12  
     28.   Apr 2020    2020m4  
     29.     Apr-19    2019m4  
     30.   Aug 2019    2019m8  
     31.   Dec 2019   2019m12  
     32.   Apr 2020    2020m4  
     33.     Apr-19    2019m4  
     34.   Aug 2019    2019m8  
     35.   Dec 2019   2019m12  
     36.   Apr 2020    2020m4  
     37.     Apr-19    2019m4  
     38.   Aug 2019    2019m8  
     39.   Dec 2019   2019m12  
     40.   Apr 2020    2020m4  
     41.     Apr-19    2019m4  
     42.   Aug 2019    2019m8  
     43.   Dec 2019   2019m12  
     44.   Apr 2020    2020m4

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

      All Stata manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

      Here is how to convert your example data to a Stata monthly variable.
      [CODE]
      . generate mon = monthly(month_f,"M20Y")

      . format %tm mon
      Dear William Lisowski,

      This method works but does generate a categorical variable. When I try to string and destring it, I get something entirely different. I aim to use the categorical variable as an interaction term.

      Comment


      • #4
        If using it as a categorical variable is what your heart desires, do
        Code:
        i.date##whatever_other_var
        In your regressions and boom, problem solved.

        Comment


        • #5
          See https://www.stata-journal.com/articl...article=dm0098 for an overview of mapping string variables to numeric.

          One strong theme there is that encode and destring are usually between poor and appalling as solutions for string variables containing dates. (There is an easy exception if years have been read in as strings such as "2009" "2010" and so forth, in which case destring is exactly what you need.

          William Lisowski has already explained carefully that your date variables need dedicated date functions, including the twist that the century must be supplied for "Apr-2019".

          In the data example there are just four distinct dates, every 4 months from April 2019 to April 2020. Any mapping to integers would create a variable you could use in interaction terms, as Jared Greathouse flags in #4. Here are three possibilities:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str8 month_f
          "Apr-19"  
          "Aug 2019"
          "Dec 2019"
          "Apr 2020"
          end 
          
          gen mdate = monthly(month_f, "M20Y")
          format mdate %tm 
          gen qdate = qofd(dofm(mdate)) 
          format qdate %tq 
          egen gdate = group(mdate), label 
          
          . list 
          
               +-------------------------------------+
               |  month_f     mdate    qdate   gdate |
               |-------------------------------------|
            1. |   Apr-19    2019m4   2019q2     711 |
            2. | Aug 2019    2019m8   2019q3     715 |
            3. | Dec 2019   2019m12   2019q4     719 |
            4. | Apr 2020    2020m4   2020q2     723 |
               +-------------------------------------+
          
          . list, nola
          
               +-------------------------------------+
               |  month_f     mdate    qdate   gdate |
               |-------------------------------------|
            1. |   Apr-19    2019m4   2019q2       1 |
            2. | Aug 2019    2019m8   2019q3       2 |
            3. | Dec 2019   2019m12   2019q4       3 |
            4. | Apr 2020    2020m4   2020q2       4 |
               +-------------------------------------+

          Note that the included hyphens are not an issue, as monthly() can cope.

          Your command

          Code:
          replace month_f1 = "" if trim( month_f1 )=="-"
          did not work because you misunderstood what trim() does, but had it worked it would have replaced all instances of Apr-19 with empty strings, which would not have helped.


          Comment

          Working...
          X