Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 001002 and 1002 in Stata

    Hi, I got a problem. I have a variable named CG_IDFULL. The researcher used 001002 denotes the second person within the first household.

    My ultimate goal is to create a new variable which only contains the first three number: 001. Thus I can merge the 1st person in the 1st household data set.

    I tried to use the code
    Code:
    generate CG_IDHH = substr(CG_IDFULL, 1, 3)
    But Stata shows the error message: type mismatch

    I think the reason is that current CG_IDFULL is numeric, not string. So I want to convert to string. I use
    Code:
    tostring CG_IDFULL, generate(CGID)
    Stata shows the message: CGID generated as str5. I don't know what str5 mean. But the problem is that Stata shows 1002 in the new variable. I know the reason is that Stata think 001002 == 1002.

    What should I do?

  • #2
    Well, something is amiss here. str5 means that the variable is a string with a maximum length of 5 characters. So there is no way that it can hold a value like "001002" with 6 characters.

    So the first concern is that already there is lost information in your data. Now, perhaps these IDs always have 0 as the leading digit. But if that is not predictably true, then you may be in a position where some of your IDs are just plain wrong. You need to check with the source of the data about this issue.

    I'll be optimistic and assume that all of the IDs if correctly recorded would be 6 characters long, and that it is the first three characters you are interested in.

    Code:
    tostring CG_IDFULL, generate(CGID) format(%06.0f)
    gen CG_IDHH = substr(CGID, 1, 3)
    If this does not solve your problem, post back with a representative sample of the data, using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde,

      My ID variable is all leading with 0s. And your code solves my problem. Thanks!

      Comment

      Working...
      X