Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Destring returns "contains nonnumeric characters; no replace"

    I try to destring two variables. One of them is the column for my cross-sectional units and the other is for my observations. They are different counties over a period of time, e.g. “CountyX2006”, “CountyY2007” etc.
    I can’t understand why there are nonnumeric characters in these variables.
    I tried running:

    destring observation, replace

    But Stata returns “observation contains nonnumeric characters; no replace”
    Any ideas how to solve this problem? I've never encountered it before.

    The reason I want to destring is is in order to use the command "xtset" that requires a numeric identifier.

    EDIT
    I think I solved this now. I encoded the counties, using:
    "encode county, gen(county1)"
    and used county1 in xtset instead
    Last edited by Eric_B; 15 Jun 2014, 03:48.

  • #2
    You can do a couple of different things, here. FIrst, you can use encode . . ., generate() label() to get a numeric variable for xtset.

    You can also examine those observations that have nonnumeric characters in them.
    Code:
    generate byte non_numeric = missing(string(my_variable))
    list my_variable if non_numeric
    or
    Code:
    generate byte non_numeric = indexnot(my_variable, "0123456789.-")
    list my_variable if non_numeric

    Comment


    • #3
      One of them is the column for my cross-sectional units and the other is for my observations. They are different counties over a period of time, e.g. “CountyX2006”, “CountyY2007” etc.
      I can’t understand why there are nonnumeric characters in these variables.
      All the characters in "CountyX" and "CountyY" count as non-numeric.

      The purpose of destring seems widely misunderstood. It's for variables that are essentially numeric in content, but have been misread somehow. A few individual characters may have caused this, or perhaps metadata have crept into the first few observations. encode is the command for mapping arbitrary string variables to numeric variables with value labels.

      Joseph's code includes a small slip; note also that no new variable is needed.

      Code:
      list my_variable if missing(real(my_variable))
      Here my_variable is already string, and the function real() extracts numeric content; it's the failures in that calculation that deserve attention

      Comment


      • #4
        Whoops. My mistake. Sorry if there was any confusion. Thanks for catching that, Nick, and setting it straight.

        I agree with Nick that a new variable isn't required, but Eric indicated that he was surprised at the presence of nonnumeric characters in the variables. Whenever I encounter stuff that I didn't expect in a dataset,I'll take a closer, repeated look in order to understand what's going on. I've found that generating an indicator (flag) variable is usually helpful in such circumstances, and so that's where that came from.

        Comment


        • #5
          Hello,

          I have an issue and cannot solve it. I have a big excel dataset and once I copy paste it to Stata (version 11) the variables which are numbers cannot be interpreted as numerical values anymore. I tried to commands mentioned above and when I tab the critical observations neraly all of them have been listed.

          If I encode them, then the true value of the obsevations are not correct. Sorry if it is a basic question.

          Thank you in advance,
          Adam

          Comment


          • #6
            Adamaki: You should give specific details. Precisely what the problem is, or the problems are, isn't clear from your report.

            Please also note our request that members use full real names.

            Comment


            • #7
              Adam: why are you using copy/paste rather than insheet? The latter can be used in a do-file (think audit trails and reproducibility!), and provide control over how incoming input is handled.

              I support Nick's remarks about using your names (firstname lastname). It's easy to fix: click on“Contact us” located at the bottom right-hand corner of every page, and make your request. Please see the Forum FAQ for more about this.

              Comment


              • #8
                Dear Stata Users

                I am analyzing survey data some of my variables were not correctly captured instead of

                Var_ 1 2 3 . They were captured as

                Var_
                A
                B
                C

                Is there a way recoding them into numeric?

                Comment


                • #9
                  Well, it's not clear what you mean when you say that the variables "not correctly captured," and it also isn't clear what you actually have for Var_. So please post a relevant excerpt of your data set using the -dataex- command, so that we can all of the exact details. Just knowing that it looks like A B C does not tell us what it actually is internally, and the correct solution to your problem depends crucially on that. You can get the -dataex- command by running -ssc install dataex-. It is easy to use: just read the -help dataex- file and follow the instructions there.

                  Comment


                  • #10
                    Thank you Clyde

                    This is how the data has been captured

                    input str3(HAA12A HAA12B HAA12C)
                    "A" " " " "
                    "" "B" "C"
                    "A" " " "C"
                    "" " " " "
                    "A" " " " "

                    I am working with household data on asset ownership where "A, B, C " were suppose to be captured as "1, 2, 3" indicating that the household owns the asset, and " " indicates missing data, i.e. no ownership.
                    My question is: Is there a way of converting the above such that A=1, B=2, C=3?
                    I hope my problem is clear now

                    Comment


                    • #11
                      Yes. Start with:
                      Code:
                      label define ABC    1    "A"    2    "B"    3    "C"
                      foreach v of varlist HAA* {
                          encode `v', gen(_`v') label(ABC)
                          drop `v'
                          rename _`v' `v'
                      }
                      Now, the variables will still look like they are "A", "B", and "C", but internally they are coded 1, 2, 3 and you can use them in numeric calculations, etc.

                      You can see this with your own eyes by doing the following:

                      Code:
                      tab1 HAA*
                      tab1 HAA*, nolabel
                      If you find the ABC labeling distracting and you want it to look like the 1, 2, 3 that it is internally, you can do that with:

                      Code:
                      label drop ABC
                      The core of this solution, evidently, is the -encode- command, one of the really core data management commands in Stata. Do read the manual section on -encode-, and on its inverse function, -decode-. The time spent will be amply rewarded going forward.

                      Comment


                      • #12
                        I am trying to merge two datasets to make a panel but when I run the command of merge it shows the 'variable' in using file is in string form. when I tried to destring that variable using command "destring education, replace" it gives the result "contains nonnumeric characters; no replace".
                        please suggest the solution to this problem

                        Comment


                        • #13
                          So the first thing is to identify the observations with non-numeric characters
                          Code:
                          browse if missing(real(education))
                          will show them to you. At this point there are several possibilities:

                          1. There are extensive problems and the variable is basically not really numeric in the first place. In that case, it apparently is not really a match to the education variable in the other file. You will have to look into this in greater depth.

                          2. There is a systematic simple problem, such as observations being coded "N/A" or something like that. In that case you can just -replace education = "" if education == "N/A"- and then do your -destring-ing.

                          3. There are sporadic observations where instead of a number there is something else, such as ">50" or "7-10". You will have to decide how you want to handle these, either replacing them by missing values, or by assigning single numbers to replace them. Then write the appropriate series of -replace- commands and then -destring-.

                          4. The decimal point is represented as a comma. Here the solution is to specify the -dpcomma- option in your -destring- command.

                          5. The numbers have embedded spaces or commas separate groups of digits. Specify the -ignore(", ")- option in -destring-.

                          6. There is nothing apparently wrong: everything you see looks like a properly specified number. This means that somehow your variable has been contaminated with non-printing characters that you cannot see but Stata can. Install -charlist- (by Nick Cox) from SSC if you do not already have it, and run -charlist education if missing(real(education))-. Then -return list- and r(ascii) will show you the numeric codes of all the characters in those observations. Identify the ones which are not properly part of a number, and write -replace- commands to get rid of them.

                          7. Your education variable is a bunch of strings like "some HS", "HS grad", "some college", "college grad", etc. and these correspond to value labels of the education variable in the other data set. In that case, you should not be -destring-ing this variable: you should -encode- it. But when doing that, you must be careful to use the same value label that is used for education in the first data set.

                          Comment


                          • #14
                            https://www.statalist.org/forums/for...69#post1482869
                            Dear All, I was trying to apply what is suggested in the link above and wrote the following code:
                            Code:
                            label define educat  1"En  total désaccord"    2"En  désaccord"  3"Neutre" 4"D'accord" 5"Tout à fait d’accord"
                            ********
                            foreach var of varlist Educ_1-Educ_18 {
                                encode `var', gen(`var'_)  label(educat)
                                drop `var'
                                rename `var'_ `var'        
                            }
                            To my great suprise, I get the result below where option 1 and 2 have disappeared:

                            .

                            tab Educ_16

                            Ensei. compétents | Freq. Percent Cum.
                            ----------------------+-----------------------------------
                            Neutre | 33 34.74 34.74
                            D'accord | 46 48.42 83.16
                            Tout à fait d’accord | 2 2.11 85.26
                            En total désaccord | 5 5.26 90.53
                            En désaccord | 9 9.47 100.00
                            ----------------------+-----------------------------------
                            Total | 95 100.00
                            . tab Educ_16, nolab

                            Ensei. |
                            compétents | Freq. Percent Cum.
                            ------------+-----------------------------------
                            3 | 33 34.74 34.74
                            4 | 46 48.42 83.16
                            5 | 2 2.11 85.26
                            6 | 5 5.26 90.53
                            7 | 9 9.47 100.00
                            ------------+-----------------------------------
                            Total | 95 100.00
                            Can someone help me understand what is happening so as to change the label I predefined?
                            Many thanks in advance

                            Comment


                            • #15
                              My guess is that the label is intact, you just don't have these values in the variable "Educ_16". To view all values of the label and their description

                              Code:
                              lab list

                              Comment

                              Working...
                              X