Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Type mismatch when trying to get substring

    Hi, I'm working with trade data and I want to get a substring from a numeric code that includes the code of the country of origin, destiny and type of product. But I keep getting the type mismatch error. Here's my code:

    gen code1 = real(substr(code,1,3))

  • #2
    Stata is trying to tell you that your variable code is not a string variable. If it appears to contain non-numeric characters when you -list- or -browse- it, then it is a value-labeled variable, and if you need to extract a substring from that, you will have to -decode- it first. If it looks numeric in -list- output or the -browse-r, then it might still be a value-labeled numeric variable, or it might be the actual numbers. In the former case, you would still have to -encode- it and extract a substring. If it's really the numbers, then selecting certain digits can be done with appropriate combinations of the -floor()- and -mod()- functions.

    With no example data it is impossible to be more specific than that. If you need more detailed help, post back showing example data. In this case, it is absolutely imperative that the example data be shown using the -dataex- command. Anything else will predictably fail to provide the necessary information about the variable, and will just be a waste of your time and effort. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you for the quick response, here's an example of my data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float code double v
      40040                  0
      40041                  0
      40042                  0
      40043                  0
      40044                  0
      40045                  0
      40046                  0
      40047                  0
      40048                  0
      40049                  0
      40080                  0
      40081                  0
      40082                  0
      40083                  0
      40084                  0
      40085                  0
      40086                  0
      40087                  0
      40088                  0
      40089                  0
      40100                  0
      40101                  0
      40102                  0
      40103                  0
      40104                  0
      40105                  0

      Comment


      • #4
        Well, you did not post the complete -dataex- output, and you omitted some of the most crucial information. But from what you did show, it seems most likely that the variable code is actually numeric, not value labeled, and that its numeric values are actually things like 40088 or 40105, etc. On that assumption, to extract the three high-order digits:

        Code:
        gen int code1 = floor(code/100)
        If that does not produce the intended results, please post back with the complete -dataex- output. When you run -dataex-, it includes an initial line that says "copy starting from the next line" and a final line saying "copy up to and including the previous line." If you follow those instructions, you will get exactly what is needed to show the example data with full information.

        Comment


        • #5
          It worked perfectly, than you. What if instead I wanted to get numbers from the middle of the variable code?

          Comment


          • #6
            That's a bit more complicated. Let's say you wanted to get digits 2 through 4 from these 5-digit numbers:

            Code:
            gen code2 = floor(code/10) // THIS REMOVES THE 5th DIGIT
            replace code2 = mod(code2, 1000) // SELECTS LAST 3 OF THE REMAINING DIGITS
            If you are going to be doing a bunch of things like this, rather than having to puzzle out the correct numbers to use with -floor()- and -mod()- it would make more sense to make a string version of the code variable:

            Code:
            tostring code, gen(code_str) format(%05.0f) // DISPLAY 5 DIGITS AND 0-PAD THE LEFT IF SOME ARE SHORTER
            Then you can just use the -substr()- function on code_str, the way you tried to apply it to code itself in #1.

            Comment

            Working...
            X