Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenate of a string and a number

    Dear statalister,

    I am trying to merge to database, I would like to try using a concatenate of country and year, first is a string and the second a number. Are there any function or command to do it? I tested strcat and something else I found in the fórum but one is for two strings and the second for two numbers.
    Thank you for your kind help.

    Best regards,
    Alejandro

  • #2
    Run -help egen- and look at the -concat()- function there.

    Comment


    • #3
      thank you Clyde, I found a
      Code:
      catenar
      instruction. But it didn't work for the merge anyway.
      Thank you

      Comment


      • #4
        I don't know what a -catenar- instruction, but it is clearly not what Clyde advised; please (re-)read the FAQ and follow its advice

        Comment


        • #5
          Alejandro Torres ,

          As Clyde indicated, egen, concat allows you to concatenate strings with numbers. (It even automatically converts numbers to strings before concatenating them).

          Some toy data:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str9 country int year
          "Australia" 2010
          "Australia" 2011
          "Australia" 2012
          "Canada"    2010
          "Canada"    2011
          "Canada"    2012
          "US"        2010
          "US"        2011
          "US"        2012
          end
          Code:
          egen country_yr = concat(country year), punct(-)  // I separated them by a "-", you can change to whatever you want
          * Here year is an integer
          list, sepby(country) noobs
          
            +-----------------------------------+
            |   country   year       country_yr |
            |-----------------------------------|
            | Australia   2010   Australia-2010 |
            | Australia   2011   Australia-2011 |
            | Australia   2012   Australia-2012 |
            |-----------------------------------|
            |    Canada   2010      Canada-2010 |
            |    Canada   2011      Canada-2011 |
            |    Canada   2012      Canada-2012 |
            |-----------------------------------|
            |        US   2010          US-2010 |
            |        US   2011          US-2011 |
            |        US   2012          US-2012 |
            +-----------------------------------+

          Comment


          • #6
            I would like to try using a concatenate of country and year
            What I fail to understand is why one would need to or want to concatenate the two merge keys country and year, since the merge command is capable of matching on more than one variable.

            Comment


            • #7
              Hello William, I am testing different ways to merge two datasets, when I use joinby there are a duplicate of data, when I use merge by two variables I receive a message saying: variables country year do not uniquely identify observations in the using data, and I dont know what does it mean. I did it once at it Works, but a second time is not longer working again, it is frustraiting.
              Thank you

              Comment


              • #8
                If the variables country and year do not uniquely identify observations, then sticking them together into a single variable will not magically make the result unique. There is absolutely nothing to be gained, either with merge or with joinby, by doing so.

                You do not solve this problem by guessing differnt things to try until something works - if you get it wrong it will "work" in the sense of "not fail with an error message" but you won't know if it has done what you need. Ideally, you solve the problem by reading the documentation and understanding what it is instructing you to do. But none of us are good enough to be able to do that all the time: sometimes we all need someone else's perspective to help us understand what we are failing to understand. For those working in an environment with friends and colleagues who can help, we often turn to them for a second set of eyes to look at our work. When that isn't possible, for whatever reason, then we turn to sites like Statalist.

                So to solve this problem you need to describe your datasets adequately, provide sample data to help us understand it, show commands you have run and their results, and when suggestions are made, you act on them and report back the results.

                Looking at this topic and your most recent previous topics, what is striking is that in asking for help, you have not provided us either samples of your data or examples of the code you have run and what Stata has told you.

                Please spend some time to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. Section 12.1 is particularly pertinent

                12.1 What to say about your commands and your problem

                Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
                You will do that by running your merge command and then copying the command and its output from Stata's Results window, pasting it into a post, surrounded by code delimiters [CODE] and [/CODE]. For example,

                [CODE]
                . sysuse auto, clear
                (1978 Automobile Data)

                . describe make price

                storage display value
                variable name type format label variable label
                -----------------------------------------------------------------
                make str18 %-18s Make and Model
                price int %8.0gc Price
                [/CODE]

                will be presented in the post as the following:
                Code:
                . sysuse auto, clear
                (1978 Automobile Data)
                
                . describe make price
                
                              storage   display    value
                variable name   type    format     label      variable label
                -----------------------------------------------------------------
                make            str18   %-18s                 Make and Model
                price           int     %8.0gc                Price
                Once we can see exactly what command you have given we can ask questions precisely aimed at that command. For example, there are three versions of the merge command and it makes a difference which one you are using.
                Code:
                merge 1:1 ...
                merge m:1 ...
                merge 1:m ...
                Then, there's your data. Even the best descriptions of data are no substitute for an actual example of the data.

                Please be sure to use the dataex command to show your example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                When asking for help with code, always show example data. When showing example data, always use dataex.

                The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

                Comment


                • #9
                  Hi Alejandro Torres

                  "variables country year do not uniquely identify observations in the using data"
                  Hopefully this will help.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str9 country int year
                  "Australia" 2010
                  "Australia" 2011
                  "Australia" 2012
                  "Australia" 2012
                  "Canada"    2010
                  "Canada"    2011
                  "Canada"    2012
                  "Canada"    2012
                  "Canada"    2012
                  "US"        2010
                  "US"        2011
                  "US"        2012
                  "US"           .
                  "US"           .
                  end
                  
                  sort country year
                  isid country year  // checking to see if country year uniquely identify obs.  They don't :-)
                  variables country year should never be missing
                  r(459);
                  
                  . list country year, sepby(country )
                  
                       +------------------+
                       |   country   year |
                       |------------------|
                    1. | Australia   2010 |
                    2. | Australia   2011 |
                    3. | Australia   2012 |
                    4. | Australia   2012 |
                       |------------------|
                    5. |    Canada   2010 |
                    6. |    Canada   2011 |
                    7. |    Canada   2012 |
                    8. |    Canada   2012 |
                    9. |    Canada   2012 |
                       |------------------|
                   10. |        US   2010 |
                   11. |        US   2011 |
                   12. |        US   2012 |
                   13. |        US      . |
                   14. |        US      . |
                       +------------------+
                  Stata is telling you that your data has one or both of the problems above. Note that Australia 2012 & Canada 2012 are listed 2x. Also note that year is missing 2x for US. If country year should uniquely identify the data, then you will want to look for duplicates (help duplicates).


                  Code:
                  help duplicates
                  duplicates tag country year, gen(dup1)
                  
                  list, sepby(country )
                  
                       +-------------------------+
                       |   country   year   dup1 |
                       |-------------------------|
                    1. | Australia   2010      0 |
                    2. | Australia   2011      0 |
                    3. | Australia   2012      1 |
                    4. | Australia   2012      1 |
                       |-------------------------|
                    5. |    Canada   2010      0 |
                    6. |    Canada   2011      0 |
                    7. |    Canada   2012      2 |
                    8. |    Canada   2012      2 |
                    9. |    Canada   2012      2 |
                       |-------------------------|
                   10. |        US   2010      0 |
                   11. |        US   2011      0 |
                   12. |        US   2012      0 |
                   13. |        US      .      1 |
                   14. |        US      .      1 |
                       +-------------------------+
                  Then you can brow where dup1>=1 and see whether some can be combined or dropped or whatever you need to do with them.

                  If that doesn't solve your problem, come back and post a sample of your data using Stata's dataex command. If you need help with that, I created a Youtube tutorial on it at https://youtu.be/bXfaRCAOPbI (definitely watch it at 2x speed :-). You can probably just watch the first 5 minutes and be good). It also shows you how to use the code delimiters that William mentioned in post #8.

                  Good luck!
                  Last edited by David Benson; 28 Dec 2018, 13:59.

                  Comment


                  • #10
                    wow this so helpful thanks

                    Comment


                    • #11
                      Please, can I take this opportunity to ask how I can proceed to "deconcatenate" a string (str11) variable to 11 binary (byte) variables ? Thanks.

                      Comment


                      • #12
                        If the string consists of numeric characters, e,g, "01234567890", then you can go


                        Code:
                        forval j = 1/11 { 
                              gen byte foo`j' = real(substr(foo, `j', 1))
                        }
                        If that's not the answer, please give a data example that clarifies what you want.

                        Comment


                        • #13
                          Thank you. It is so helpful.

                          Comment

                          Working...
                          X