Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String to Numeric-Need UNIQUE values---Encode/Destring/Real is not working

    Hi all,

    I am working with the following dataset:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str30(PART_ID GIVEN_VOUCHER1 GIVEN_VOUCHER2 GIVEN_VOUCHER3)
    "M0001" "M1158" "M9817" "M6879"
    "M0002" "M3584" "M2138" "M4076"
    "M0003" "M6562" "M6584" "M2333"
    "M0004" "M2646" "M9405" "M5265"
    "M0005" "M6299" "M1237" "M1848"
    end
    Each ID and voucher ID is unique. I need to convert these string variables into numeric variables to calculate RDS weights (STATA command rds_network and rds). I used "encode/real/destring" command but i am getting duplicate numeric values (e.g., PART_ID==7; GIVEN_VOUCHER1=42 GIVEN_VOUCHER2=325; GIVEN_VOUCHER3=7 or PART_ID==12; GIVEN_VOUCHER1=442 GIVEN_VOUCHER2=48; GIVEN_VOUCHER3=48) .

    How can I create unique values for each ID and each voucher? Thanks in advance.

    Best,

    Syed
    Last edited by Syed W Noor; 22 Apr 2019, 18:15.

  • #2
    After reading your post several times I remain confused as to what you did and what you want. There is no "encode/real/destring" command. There are -encode- and -destring- commands, and a -real()- function. Neither -real()- nor -destring- has any apparent applicability here. Only -encode- might be useful. Indeed, it's not clear why "it's not working" and your illustration of what is wrong really doesn't help me understand the problem.

    Please post back showing the exact code you tried, and then show the results you got from it and explain why they are not satisfactory (and, better still, show what you actually want). If you do those things, I think you will get a timely and helpful response.

    Also, it is the norm in this community to use our real first and last names as our userid, in order to promote collegiality and professionalism. The Forum software will not allow you to make the change by editing your profile. But you can click on CONTACT US in the lower right corner of this page and message the administrator requesting the change. Thank you.

    Comment


    • #3
      I agree with Clyde, your statement of your problem is unclear. Here is my guess.

      You have a set of string values consisting of the letter M followed by 4 digits. These values are used for the PART_ID value and in the three GIVEN_VOUCHER values. You need to create numeric values corresponding to these string values, and the value assigned to, for example, M1234 must be the same whether it is used for PART_ID, GIVEN_VOUCHER1, ... .

      If that is what you seek, then perhaps the following does what you need.
      Code:
      . destring PART_ID, replace ignore("M")
      PART_ID: character M removed; replaced as byte
      
      . destring GIVEN_VOUCHER1, replace ignore("M")
      GIVEN_VOUCHER1: character M removed; replaced as int
      
      . destring GIVEN_VOUCHER2, replace ignore("M")
      GIVEN_VOUCHER2: character M removed; replaced as int
      
      . destring GIVEN_VOUCHER3, replace ignore("M")
      GIVEN_VOUCHER3: character M removed; replaced as int
      
      . list, clean noobs
      
          PART_ID   GIVEN_~1   GIVEN_~2   GIVEN_~3  
                1       1158       9817       6879  
                2       3584       2138       4076  
                3       6562       6584       2333  
                4       2646       9405       5265  
                5       6299       1237       1848
      If my understanding of your data or your objective is incorrect, you will need to explain in more detail.

      Comment


      • #4
        Hi Clyde,

        My sincere apologies. Thanks for the suggestion, I will contact the administrator to change my user id.

        Below are the data for two participants: M1477 and M3063.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str30(PART_ID GIVEN_VOUCHER1 GIVEN_VOUCHER2 GIVEN_VOUCHER3 GIVEN_VOUCHER4 GIVEN_VOUCHER5 GIVEN_VOUCHER6)
        "M1477" "M1375" "M8889" "M9456" "M7584" "M6058" "M4622"
        "M3063" "M7348" "M9917" "M3206" "M3979" "M3468" "M1495"
        end
        I used the following codes to convert the string variables to numeric variables.

        encode PART_ID, g(Part_Id)
        encode GIVEN_VOUCHER1, g(vouch1)
        encode GIVEN_VOUCHER2, g(vouch2)
        encode GIVEN_VOUCHER3, g(vouch3)
        encode GIVEN_VOUCHER4, g(vouch4)
        encode GIVEN_VOUCHER5, g(vouch5)
        encode GIVEN_VOUCHER6, g(vouch6)


        Here are the modified data for these two participants.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str30 PART_ID long(Part_Id vouch1 vouch2 vouch3 vouch4 vouch5 vouch6)
        "M1477" 128 128 1011 1047 848 634 448
        "M3063" 342 814 1124  300 415 342 107
        end
        label values Part_Id Part_Id
        label def Part_Id 128 "M1477", modify
        label def Part_Id 342 "M3063", modify
        label values vouch1 vouch1
        label def vouch1 128 "M1375", modify
        label def vouch1 814 "M7348", modify
        label values vouch2 vouch2
        label def vouch2 1011 "M8889", modify
        label def vouch2 1124 "M9917", modify
        label values vouch3 vouch3
        label def vouch3 300 "M3206", modify
        label def vouch3 1047 "M9456", modify
        label values vouch4 vouch4
        label def vouch4 415 "M3979", modify
        label def vouch4 848 "M7584", modify
        label values vouch5 vouch5
        label def vouch5 342 "M3468", modify
        label def vouch5 634 "M6058", modify
        label values vouch6 vouch6
        label def vouch6 107 "M1495", modify
        label def vouch6 448 "M4622", modify
        Now, you can see that the participant M1477 has same Id and voucher 1 value (128) and the participant M3063 has same Id and voucher 5 value (342).


        When I ran the following rds_network command:

        "rds_network eth_white, id(ID) coupon(vouch) ncoupon(6) degree(net_size) ///
        > ancestor(seed_ID) depth(reqt_chain) recruiter_id(reqt_Id) recruiter_var(reqt_var)"


        I got this error message:

        "There are self-references (a coupon of a respondent refers to his id)"


        Now my question is how can I set the data to have unique values for each participant.


        I hope this clarifies the problem.


        Thanks,

        Syed W Noor
        Ryerson University
        Toronto, Canada





        Comment


        • #5
          Hi William,

          Thanks for your help. I have participants with 'T' also not only 'M'.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str30 PART_ID long(Part_Id vouch1 vouch2 vouch3 vouch4 vouch5 vouch6)
          "T0028" 1205 1346 1320 1218 1354 1205 1275
          "T1312" 1262 1177 1262 1399 1474 1328 1250
          end
          label values Part_Id Part_Id
          label def Part_Id 1205 "T0028", modify
          label def Part_Id 1262 "T1312", modify
          label values vouch1 vouch1
          label def vouch1 1177 "T2208", modify
          label def vouch1 1346 "T5237", modify
          label values vouch2 vouch2
          label def vouch2 1262 "T3708", modify
          label def vouch2 1320 "T4607", modify
          label values vouch3 vouch3
          label def vouch3 1218 "T2722", modify
          label def vouch3 1399 "T5737", modify
          label values vouch4 vouch4
          label def vouch4 1354 "T5434", modify
          label def vouch4 1474 "T7318", modify
          label values vouch5 vouch5
          label def vouch5 1205 "T2180", modify
          label def vouch5 1328 "T4487", modify
          label values vouch6 vouch6
          label def vouch6 1250 "T3206", modify
          label def vouch6 1275 "T3634", modify

          Comment


          • #6
            OK, I understand it better now. Here's an approach that will work with M's and T's and anything else you can throw at it and produces a one-to-one correspondence between the strings you have and the numbers you get:

            Code:
            rename PART_ID GIVEN_VOUCHER0
            gen long obs_no = _n
            reshape long GIVEN_VOUCHER, i(obs_no) j(_j)
            encode GIVEN_VOUCHER, gen(voucher)
            drop GIVEN_VOUCHER
            reshape wide voucher, i(obs_no) j(_j)
            rename voucher0 part_id
            The only thing I can think of that might break this code is if there are more than 65,536 distinct values of these variables. In that case, -encode- will fail. In that case, you can use -egen, group()- to get the numeric coding instead, but you will not be able to create a value label for more than 65,536 distinct values no matter what.

            Comment


            • #7
              Hi Clyde,

              Thanks a lot, it worked. However, it is assigning value of 1 for blank cells of string variables (i.e. GIVEN_VOUCHER) instead of assigning "." as missing value. Any suggestion (instead of replacing 1 with '.')?


              Also, chiming in with William's suggestion, I am wondering can I do something like this:

              Replacing 'M' with 11, 'T' with 22 and 'V' with 33 and then use the 'real' function. so, PART_ID M001 will be 11001 and T001 will be 22001 and GIVEN_VOUCHER M1178 will be 111178 and T1471 will be 221471 and so on. Will this create unique numeric participant IDs and voucher IDs? if so, would you please help me with the code?

              Thanks in advance,

              Best,

              Syed W Noor
              Toronto, Canada



              Comment


              • #8
                Consider the following sample data.
                Code:
                clear
                input str6 s
                ""
                " "
                "  "
                "M123"
                "M123 "
                Code:
                . encode s, generate(e)
                
                . dataex
                
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str6 s long e
                ""      .
                " "     1
                "  "    2
                "M123"  3
                "M123 " 4
                end
                label values e e
                label def e 3 "M123", modify
                label def e 4 "M123", modify
                The first observation demonstrates that Stata missing string values are encoded to Stata missing numeric values.

                The second and third observations demonstrate that encode treats blanks as it does any other character. Note that 2 blanks is considered as a different value than 1 blank, and note that encode does not label those values.

                The fourth and fifth observations demonstrate that inclusion of trailing blanks results in a different encoded value than when they are missing; however, the label excludes the trailing blank, which probably explains why labels were not created for the second and third observations.

                Using the trim() function to eliminate leading and trailing blanks before using encode solves the problem,
                Code:
                . replace s = trim(s)
                (3 real changes made)
                
                . encode s, generate(e)
                
                . dataex
                
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str6 s long e
                ""     .
                ""     .
                ""     .
                "M123" 1
                "M123" 1
                end
                label values e e
                label def e 1 "M123", modify
                I do not address your question about replacing letters with numeric strings because that depends on assumptions about your data, and it seems that with each post we learn more about your data. In post #1 it appeared your data consisted of the letter M followed by 4 digits. I inquired about that in post #3 and in post #5 you tell us you have a T as well as an M. In post #7 you further tell us you have a V, you have some 3-digit numbers, and you have some values that are one or more spaces.

                Comment


                • #9
                  Hi Clyde and William,

                  Thanks a lot for your help. There were some data entry errors, but I took care of them. Finally, I have managed to get the desired output. Below is the code (for reference or for someone else to use), that I used to get the output:

                  Code:

                  sort PART_ID

                  foreach var of varlist PART_ID REFERRER_ID GIVEN_VOUCHER*{
                  replace `var' = trim(`var')

                  gen new_`var' = `var'
                  replace new_`var' = subinstr(new_`var',"M","11",.)
                  replace new_`var' = subinstr(new_`var',"T","22",.)
                  replace new_`var' = subinstr(new_`var',"V","33",.)

                  g num_new_`var'=real(new_`var')

                  list `var' new_`var' num_new_`var' in 200/210

                  }

                  list PART_ID Part_Id GIVEN_VOUCHER* new_* num_* if PART_ID=="M1477", nolab


                  ren (num_new_PART_ID num_new_REFERRER_ID num_new_GIVEN_VOUCHER1 ///
                  num_new_GIVEN_VOUCHER2 num_new_GIVEN_VOUCHER3 num_new_GIVEN_VOUCHER4 ///
                  num_new_GIVEN_VOUCHER5 num_new_GIVEN_VOUCHER6 ) (part_id ref_id voucher1 ///
                  voucher2 voucher3 voucher4 voucher5 voucher6)


                  list PART_ID Part_Id GIVEN_VOUCHER* new_* vouch* if PART_ID=="M1477", nolab


                  foreach var of varlist PART_ID REFERRER_ID voucher*{
                  sort `var'
                  quietly by `var': gen dup_`var' = cond(_N==1,0,_n)
                  tab dup_`var'

                  }


                  keep PART_ID Part_Id REFERRER_ID part_id ref_id voucher*


                  Thanks again.

                  Best,

                  Syed W Noor
                  Toronto, Canada


                  Comment

                  Working...
                  X