Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to change a nonnumeric String variable into a numeric variable

    I am using PRAMS phase 8 to complete a research project for class, but i am running into a problem. I am trying to make my analytic sample which includes certain states. But when I do this, STATA does not recognize the state abbreviations. I have tried to find YT videos that will help me, but I can't. I have tried the destring and the encode commands but to no avail. Log below.

    . tab STATE

    STATE NAME | Freq. Percent Cum.
    ------------+-----------------------------------
    AK | 5,338 2.63 2.63
    AL | 3,138 1.55 4.18
    AR | 3,704 1.83 6.01
    AZ | 794 0.39 6.40
    CO | 6,033 2.98 9.37
    CT | 6,767 3.34 12.71
    DC | 1,593 0.79 13.50
    DE | 4,455 2.20 15.70
    FL | 2,151 1.06 16.76
    GA | 3,208 1.58 18.34
    HI | 2,932 1.45 19.78
    IA | 4,436 2.19 21.97
    IL | 6,352 3.13 25.11
    IN | 865 0.43 25.53
    KS | 4,154 2.05 27.58
    KY | 3,172 1.56 29.15
    LA | 4,411 2.18 31.32
    MA | 7,267 3.58 34.91
    MD | 5,217 2.57 37.48
    ME | 4,226 2.08 39.56
    MI | 8,351 4.12 43.68
    MN | 2,963 1.46 45.14
    MO | 5,671 2.80 47.94
    MS | 3,485 1.72 49.66
    MT | 3,183 1.57 51.23
    NC | 2,783 1.37 52.60
    ND | 3,016 1.49 54.09
    NE | 5,754 2.84 56.93
    NH | 3,039 1.50 58.43
    NJ | 6,019 2.97 61.40
    NM | 5,763 2.84 64.24
    OK | 4,854 2.39 66.63
    OR | 5,749 2.84 69.47
    PA | 5,612 2.77 72.24
    PR | 3,893 1.92 74.16
    RI | 4,338 2.14 76.30
    SD | 4,138 2.04 78.34
    TN | 1,296 0.64 78.98
    TX | 1,849 0.91 79.89
    UT | 7,232 3.57 83.46
    VA | 4,843 2.39 85.84
    VT | 4,245 2.09 87.94
    WA | 6,084 3.00 90.94
    WI | 6,211 3.06 94.00
    WV | 2,785 1.37 95.38
    WY | 2,621 1.29 96.67
    YC | 6,755 3.33 100.00
    ------------+-----------------------------------
    Total | 202,745 100.00


    . gen stated=.
    (202,745 missing values generated)

    . replace stated=1 if STATE==AL | STATE==AR | STATE==DE | STATE==DC | STATE==FL
    > | STATE==GA| STATE==KY| STATE==LA| STATE==MD| STATE==MS| STATE==NC| STATE==O
    > K| STATE==TN| STATE==TX| STATE==VA| STATE==WV
    AL ambiguous abbreviation
    r(111);

    . tab STATE=1
    invalid syntax
    r(198);

    . describe STATE

    Variable Storage Display Value
    name type format label Variable label
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------
    STATE str2 %2s STATE NAME

    . destring STATE, generate(staten)
    STATE: contains nonnumeric characters; no generate

    . encode STATE, generate(state_n)

    . list STATE state_n in 1/10

    +-----------------+
    | STATE state_n |
    |-----------------|
    1. | AK AK |
    2. | AK AK |
    3. | AK AK |
    4. | AK AK |
    5. | AK AK |
    |-----------------|
    6. | AK AK |
    7. | AK AK |
    8. | AK AK |
    9. | AK AK |
    10. | AK AK |
    +-----------------+

    After trying everything I know, I am still unable to generate my new state variable because STATA is not recognizing the abbreviations. Please help.

  • #2
    Why was -encode- of no avail? It is the usual way people create numeric variables out of this kind of non-numeric string. It exists precisely for the purpose of creating a numeric category variable out of a string variable.
    Code:
    encode STATE, gen(n_state)
    should do exactly what you want. What was unsatisfactory about the results you got? Please respond with example data that illustrates your difficulty, using the -dataex- command, the exact code you used, and an explanation of how the results differ from what you want.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      So the encode did change the variable type to long, Thank you. I think my problem is the state abbreviation

      replace stated=1 if staten ==AL | staten ==AR | staten ==DE | staten ==DC | staten ==FL | staten ==GA| staten ==KY| staten ==LA| staten ==MD| staten ==MS| st
      > aten ==NC| staten ==OK| staten ==TN| staten ==TX| staten ==VA| staten ==WV
      AL ambiguous abbreviation
      r(111);

      . "staten" is the new variable i made with the encode command.

      . describe staten

      Variable Storage Display Value
      name type format label Variable label
      -------------------------------------------------------------
      staten long %8.0g staten STATE NAME

      I think the problem is the 2 letter state abbreviation. I guess I have to figure out how to code the those or figure out which numbers are assigned to which states.

      Comment


      • #4
        OK, you can't use the value labels in staten the way you are trying to. When you write -replace stated=1 if staten ==AL...- Stata does not consider that AL might be a value label on variable staten. Stata thinks you want to compare staten to some other variable whose name is, or begins with AL. Apparently in your data you do have 2 or more variables whose names begin with AL, and so Stata now does not know which of them you mean. Of course, you don't mean either of those. You have to use a different notation to tell Stata to refer to that value label.
        Code:
        replace stated=1 if staten =="AL":staten | ...
        Now, you can finish the statement that way. But you can also accomplish the same thing with a lot less typing if you do this:
        Code:
        replace stated = 1 if inlist(staten, "AL":staten, "AR":staten, "DE":staten, "DC":staten, "FL":staten, ///
            "GA":staten, "KY":staten, "LA":staten, "MD":staten, "MS":staten, "NC":staten, "OK":staten, ///
            "TN":staten, "TX":staten, "VA":staten, "WV":staten)
        Added: I want to emphasize that the staten that follows the : in this syntax is the name of the value label, not the name of the variable. When you create a variable with -encode-, the value label name and the variable name will be the same unless you specify otherwise. But there are other ways to have value-labeled numeric variables, and the names can be different. Take a look at the variable foreign in the auto.dta data set, whose value label is named "origin".

        Comment

        Working...
        X