Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [HELP] Destring Variables Dilema

    Hello Statalist --

    Hope everyone is doing well. I am currently cleaning a dataset and am stumbling into a slight issue. When I tried to destring my variables using the following code I received the following error:

    Code:
    destring under_20_count, replace
    under_20_count contains nonnumeric characters; no replace
    After tabulating the variable I did not catch any nonnumeric numbers so I proceeded to force the destring and instead of a replacing the variable (to avoid data loss), I generated a new variable using this code. My intentton was that this code would help me figure out what is going on:

    Code:
    destring under_20_count, generate(newvar) force
    under_20_count contains nonnumeric characters; newvar generated as int
    (1817 missing values generated)
    After browsing my dataset, I found that stata was interpreting entries with a "," as a nonnumeric character. For example, an observation with a number greater than 1,000 was classified as nonnumeric (because of the comma) and therefore showed to have a missing value when I forced the destring. I know it is not ideal to use screenshots since they are not useful to the forum helpers, but my dataset is too big to share and I just wanted to include the screenshot to better exemplify what I am trying to express.

    As you can see from the screenshot, the original entries without commas were destringed successfully while those with a comma were not destringed.

    Any ideas or suggestions on how I can go about overcoming this?


    Click image for larger version

Name:	Screen Shot 2020-02-26 at 10.27.39 PM.png
Views:	2
Size:	95.2 KB
ID:	1538651
    Attached Files

  • #2
    -destring- also has an -ignore()- option which you can use to tell it to disregard the commas. -help destring- for details.

    I know it is not ideal to use screenshots since they are not useful to the forum helpers, but my dataset is too big to share and I just wanted to include the screenshot to better exemplify what I am trying to express.


    In this instance, the screenshot provided adequate information to solve the problem. But, in general, they do not, and even when they do, if the solution requires code that is complicated enough to require some testing first, then they are useless. You don't need to share the entire data set in any case. The -dataex- command allows you to specify only those variables that you need to show for the purpose at hand, and it also allows you to select which observations to show, using -if- and -in- conditions just like (most) other Stata commands. It also, by default, shows only the first 100 observations, although you can specify a larger or smaller number if appropriate.. If you don't already have -dataex- installed, run -ssc install dataex- to get it. Read -help dataex- to see how utterly simple it is to create example data with it. There really is no excuse for not using -dataex- in situations like yours.

    Comment


    • #3
      Thank you Clyde! Code worked perfectly, for anyone in the future with this problem the code I used to ignore the comma is as follows:

      Code:
      destring under_20_count, generate(newvar) ignore(",")

      Comment

      Working...
      X