Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • substr command not working.

    I have variable shrcd, which is a two digit code but it show's value 1.1e+01, When I click on values in editable mode it shows two digits as shown in picture.
    As its a two digit code I want to get first value of this variable so I applied substr command(gen aa=substr(shrcd,1,1) as shown in picture. it shows type mismatch, I even tried (gen aa=real(substr(shrcd,1,1)) but it shows the same msg.
    So I have two questions here why it is showing values like 1.1e+01?
    How to apply substr command in this situation? Thanking in anticipation.
    Click image for larger version

Name:	Screenshot 2017-06-28 16.54.07.png
Views:	1
Size:	174.0 KB
ID:	1399613

  • #2
    Various confusions here.

    Zeroth, please do read https://www.statalist.org/forums/help and particularly https://www.statalist.org/forums/help#stata as we ask all readers to do before posting. After 52 posts here this advice should be familiar to you, so please take it. In particular you are enjoined not to post screenshots which are usually unreadable and only very rarely helpful. The last link explains what you should do, namely use dataex (SSC) to show data examples and CODE delimiters to make code readable.

    First, substr() is a function, not a command. In Stata functions and commands are quite distinct.

    More crucially, you can only extract substrings from strings and -- as Stata is trying to tell you -- you are trying to extract a substring from a numeric argument; hence the error message type mismatch. So, a correct summary would be that you misapplied a function.

    Backing up, you ask why Stata is showing the variable in this way. Presumably that is because it has somehow acquired the corresponding display format. If you

    Code:
    describe shrcd 
    summarize shrcd
    you will see the display format assigned and we can check your assertion that values are only ever two digits. My wild guess is that at some point, perhaps still, this variable contained some values that were not two digits long.

    Your problem is stated as wanting the first value, which would strictly be the value in the first observation, namely

    Code:
    display shrcd[1] 
    but I guess that you mean the first digit, e.g. mapping from 11 to 1, 42 to 4, or whatever. As your variable is numeric

    Code:
     
    gen firstdigit = floor(shrcd/10)
    will get you there directly (just divide by 10 and round down) but your own code could be fixed to

    Code:
    gen aa = real(substr(string(shrcd),1,1))
    which is more roundabout, but should work too. Absent a data example, I can't test this on your data.





    Comment


    • #3
      Thank you Sir Nick Cox for detailed explanation.

      Comment


      • #4
        Good, although you didn't show us the results of

        Code:
        describe shrcd  
        summarize shrcd
        which might be helpful to others too to see why this arose.

        Comment


        • #5
          Here is the result of the above code.
          Code:
          . describe shrcd  
          
                        storage   display    value
          variable name   type    format     label      variable label
          -------------------------------------------------------------------------------------------------------------------------------------
          shrcd           double  %2.0g                 Share Code
          
          . 
          . summarize shrcd
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 shrcd |    126,108    5.967742     8.14504          0         72

          Comment


          • #6
            OK. So we can confirm that such a display format and numeric values like 11 lead to displays such as

            Code:
            . di %2.0g 11
             1.1e+01
            But that's as much explanation as I can give. You do have two-digit codes (except where they happen to be one-digit codes), yet why that format was applied I can't tell you. You know where the data come from and what you did to the data.

            Note that as you do have, contrary to #1, some one-digit codes they will return 0 from floor(shrcd/10). If that's not what you want, you need something else. I would check on your one-digit codes with

            Code:
            tab shrcd if shrcd < 10

            Comment


            • #7
              I have obtained this data from wrds/CRSP. shrcd is a two digit code according to the variable definition given on their website. What explained there is, if the first digit of the two digit code is 1 then It means it is common or ordinary share. so I need to keep these values.
              I need to check why so many 0 values.
              Code:
              . tab shrcd if shrcd < 10
              
               Share Code |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        0 |     49,897      100.00      100.00
              ------------+-----------------------------------
                    Total |     49,897      100.00

              Comment

              Working...
              X