Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert 1K to 1000 and 1M to 1000000

    Hello everyone, I need help with converting a particular variable to numbers. The variable in question is the amount of damage done by different disasters. The values were entered as for example 1K (Meaning $1000) and 1M (Meaning $1000000). I will want to convert these values to just numbers. Is there a way to do this in STATA.
    PS: As the variable is continuous there are different observations with the "K" suffix e.g. 9K, 12K etc. Also there are different observations with the "M" Suffix.
    I will really appreciate your insight on this, as I have a very large data set
    I use STATA16

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 amount
    "28k"
    "5M"  
    "200K"
    "500"
    "33m"
    end
    
    gen double wanted =cond(regexm(lower(amount), "k"), ///
    real(regexr(lower(amount), "k", ""))*1e+3, ///
    cond(regexm(lower(amount), "m"), ///
    real(regexr(lower(amount), "m", ""))*1e+6, real(amount)))
    Res.:

    Code:
    
    . l
    
         +-------------------+
         | amount     wanted |
         |-------------------|
      1. |    28k      28000 |
      2. |     5M    5000000 |
      3. |   200K     200000 |
      4. |    500        500 |
      5. |    33m   33000000 |
         +-------------------+

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 amount
      "28k"
      "5M"
      "200K"
      "500"
      "33m"
      end
      
      gen double wanted =cond(regexm(lower(amount), "k"), ///
      real(regexr(lower(amount), "k", ""))*1e+3, ///
      cond(regexm(lower(amount), "m"), ///
      real(regexr(lower(amount), "m", ""))*1e+6, real(amount)))
      Res.:

      Code:
      
      . l
      
      +-------------------+
      | amount wanted |
      |-------------------|
      1. | 28k 28000 |
      2. | 5M 5000000 |
      3. | 200K 200000 |
      4. | 500 500 |
      5. | 33m 33000000 |
      +-------------------+
      Dear Andrew,

      It will be great if you can explain how this code is working and what each command is doing. Thank you.

      Comment


      • #4
        What really interests me is how condition is used within a condition, if i identified it correctly

        Comment


        • #5
          Yes, you can specify a condition within a condition. Or several conditions, see

          https://www.stata-journal.com/sjpdf....iclenum=pr0016


          9 Conclusion
          cond() is versatile and not as tricky as you might fear from some of the more extreme examples of its use. Consider adding it to your Stata repertoire.

          Comment


          • #6
            Perhaps the following multi-step approach will be more transparent.
            Code:
            . generate amt = lower(amount)
            
            . replace amt = subinstr(amt,"k","000",.)
            variable amt was str4 now str6
            (2 real changes made)
            
            . replace amt = subinstr(amt,"m","000000",.)
            variable amt was str6 now str8
            (2 real changes made)
            
            . destring amt, generate(amtn)
            amt: all characters numeric; amtn generated as long
            
            . list, clean noobs
            
                amount        amt       amtn  
                   28k      28000      28000  
                    5M    5000000    5000000  
                  200K     200000     200000  
                   500        500        500  
                   33m   33000000   33000000

            Comment


            • #7
              Originally posted by Fahad Mirza View Post

              Dear Andrew,

              It will be great if you can explain how this code is working and what each command is doing. Thank you.
              Thank you very much. It worked but I had to use "lower" to "upper" since the observations I have are upper case

              Comment


              • #8
                If your string values are all upper case, then lower() is precisely what you need to be able to translate k and m to the needed number of zeros.

                Or, much simpler. if your string values are all upper case -- then just use K and M in the commands that follow.

                Comment

                Working...
                X