Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting a string variable in different observations, maintaining constant the rest of the variables

    Hello,

    I show an example of the database I have:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 region int year byte constructions str46 constructionassociated1 str30 constructionassociated2 byte typeofconstruction float(latitude longtitude)
    "A" 2006 1 "Firm A; Firm B; Firm C"                         "Firm 1; Firm 2"                 3 3.45  .9
    "A" 2007 2 "Firm A"                                         "Firm 1"                         4 3.56 2.3
    "A" 2008 3 "Firm D"                                         "Firm 3"                         5 4.56 4.5
    "A" 2009 2 "Firm H"                                         "Firm 4"                         2 5.78 6.7
    "B" 2006 2 "Firm B"                                         "Firm 2; Firm 4; Firm 7"         2  .04 5.3
    "B" 2008 2 "Firm G; Firm J; Firm B"                         "Firm 2"                         2  4.5   4
    "B" 2014 3 "Firm T"                                         "Firm 8"                         1  3.2   5
    "C" 2012 3 "Firm L"                                         "Firm 2"                         3  5.4 3.4
    "C" 2012 4 "Firm G; Firm T; Firm A"                         "Firm 4; Firm 2; Firm 3; Firm 1" 6  2.2 2.3
    "D" 2005 1 "Firm A"                                         "Firm 9"                         7  1.1   0
    "D" 2004 1 "Firm C"                                         "Firm 0"                         7  3.4 7.8
    "D" 2003 2 "Firm H; Firm L"                                 "Firm 3"                         8  6.7   1
    "H" 2002 4 "Firm A; Firm B; Firm H; Firm T; Firm L; Firm Z" "Firm 5; Firm 9"                 8    7 2.3
    "H" 2003 5 "Firm Q"                                         "Firm 1"                         9    6   4
    "H" 2006 5 "Firm W"                                         "Firm 5; Firm 9"                 3  .01   5
    end
    If you observe variables "construction associated 1" and construction associated 2" have some values separated by ;. This is not a mistake of importation, it is originally like that. I would like to create new observations separating the values of the variables "construction associated 1" and construction associated 2" by ;. I would like that these new observations maintained the same values of the rest of the variables. A more concrete example:

    Imagine observation 1, which region is A, year 2006, etc. In column constructionassociated1, there are Firm A; Firm B; Firm C. I would like to split that in three observations, Firm A, Firm B and Firm C, maintaining the value of the rest of the variables, region, year, constructions, latitude, longtitude, etc. The same applies for variable "constructionassociated2".

    Any idea/suggestion of how deal with it? Please, let me know if my question is not clear.

    Diego.
    Last edited by Diego Malo; 25 Feb 2022, 03:14.

  • #2
    The string could be split with -split- like this:

    Code:
    split constructionassociated1, p(;)
    l region year constructionassociated1* in 1,noobs

    Comment


    • #3
      Mmm... It would be a possibility Jakob Petersenn . Are you suggesting to split the variables with split command and after create the new observations with reshape?

      Comment


      • #4
        This may help:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str1 region int year byte constructions str46 constructionassociated1 str30 constructionassociated2 byte typeofconstruction float(latitude longtitude)
        "A" 2006 1 "Firm A; Firm B; Firm C"                         "Firm 1; Firm 2"                 3 3.45  .9
        "A" 2007 2 "Firm A"                                         "Firm 1"                         4 3.56 2.3
        "A" 2008 3 "Firm D"                                         "Firm 3"                         5 4.56 4.5
        "A" 2009 2 "Firm H"                                         "Firm 4"                         2 5.78 6.7
        "B" 2006 2 "Firm B"                                         "Firm 2; Firm 4; Firm 7"         2  .04 5.3
        "B" 2008 2 "Firm G; Firm J; Firm B"                         "Firm 2"                         2  4.5   4
        "B" 2014 3 "Firm T"                                         "Firm 8"                         1  3.2   5
        "C" 2012 3 "Firm L"                                         "Firm 2"                         3  5.4 3.4
        "C" 2012 4 "Firm G; Firm T; Firm A"                         "Firm 4; Firm 2; Firm 3; Firm 1" 6  2.2 2.3
        "D" 2005 1 "Firm A"                                         "Firm 9"                         7  1.1   0
        "D" 2004 1 "Firm C"                                         "Firm 0"                         7  3.4 7.8
        "D" 2003 2 "Firm H; Firm L"                                 "Firm 3"                         8  6.7   1
        "H" 2002 4 "Firm A; Firm B; Firm H; Firm T; Firm L; Firm Z" "Firm 5; Firm 9"                 8    7 2.3
        "H" 2003 5 "Firm Q"                                         "Firm 1"                         9    6   4
        "H" 2006 5 "Firm W"                                         "Firm 5; Firm 9"                 3  .01   5
        end
        
        gen long id = _n
        save safecopy, replace
        split constructionassociated1, parse(;) gen(ca1)
        drop constructionassociated1
        reshape long ca1, i(id) j(which)
        drop if missing(ca1)
        drop which
        gen ID = _n
        split constructionassociated2, parse(;) gen(ca2)
        drop constructionassociated2
        reshape long ca2, i(ID) j(which)
        drop if missing(ca2)
        drop which
        list, sepby(id)
        
             +-----------------------------------------------------------------------------------------+
             | ID   id   region   year   constr~s   typeof~n   latitude   longti~e       ca1       ca2 |
             |-----------------------------------------------------------------------------------------|
          1. |  1    1        A   2006          1          3       3.45         .9    Firm A    Firm 1 |
          2. |  1    1        A   2006          1          3       3.45         .9    Firm A    Firm 2 |
          3. |  2    1        A   2006          1          3       3.45         .9    Firm B    Firm 1 |
          4. |  2    1        A   2006          1          3       3.45         .9    Firm B    Firm 2 |
          5. |  3    1        A   2006          1          3       3.45         .9    Firm C    Firm 1 |
          6. |  3    1        A   2006          1          3       3.45         .9    Firm C    Firm 2 |
             |-----------------------------------------------------------------------------------------|
          7. |  4    2        A   2007          2          4       3.56        2.3    Firm A    Firm 1 |
             |-----------------------------------------------------------------------------------------|
          8. |  5    3        A   2008          3          5       4.56        4.5    Firm D    Firm 3 |
             |-----------------------------------------------------------------------------------------|
          9. |  6    4        A   2009          2          2       5.78        6.7    Firm H    Firm 4 |
             |-----------------------------------------------------------------------------------------|
         10. |  7    5        B   2006          2          2        .04        5.3    Firm B    Firm 2 |
         11. |  7    5        B   2006          2          2        .04        5.3    Firm B    Firm 4 |
         12. |  7    5        B   2006          2          2        .04        5.3    Firm B    Firm 7 |
             |-----------------------------------------------------------------------------------------|
         13. |  8    6        B   2008          2          2        4.5          4    Firm G    Firm 2 |
         14. |  9    6        B   2008          2          2        4.5          4    Firm J    Firm 2 |
         15. | 10    6        B   2008          2          2        4.5          4    Firm B    Firm 2 |
             |-----------------------------------------------------------------------------------------|
         16. | 11    7        B   2014          3          1        3.2          5    Firm T    Firm 8 |
             |-----------------------------------------------------------------------------------------|
         17. | 12    8        C   2012          3          3        5.4        3.4    Firm L    Firm 2 |
             |-----------------------------------------------------------------------------------------|
         18. | 13    9        C   2012          4          6        2.2        2.3    Firm G    Firm 4 |
         19. | 13    9        C   2012          4          6        2.2        2.3    Firm G    Firm 2 |
         20. | 13    9        C   2012          4          6        2.2        2.3    Firm G    Firm 3 |
         21. | 13    9        C   2012          4          6        2.2        2.3    Firm G    Firm 1 |
         22. | 14    9        C   2012          4          6        2.2        2.3    Firm T    Firm 4 |
         23. | 14    9        C   2012          4          6        2.2        2.3    Firm T    Firm 2 |
         24. | 14    9        C   2012          4          6        2.2        2.3    Firm T    Firm 3 |
         25. | 14    9        C   2012          4          6        2.2        2.3    Firm T    Firm 1 |
         26. | 15    9        C   2012          4          6        2.2        2.3    Firm A    Firm 4 |
         27. | 15    9        C   2012          4          6        2.2        2.3    Firm A    Firm 2 |
         28. | 15    9        C   2012          4          6        2.2        2.3    Firm A    Firm 3 |
         29. | 15    9        C   2012          4          6        2.2        2.3    Firm A    Firm 1 |
             |-----------------------------------------------------------------------------------------|
         30. | 16   10        D   2005          1          7        1.1          0    Firm A    Firm 9 |
             |-----------------------------------------------------------------------------------------|
         31. | 17   11        D   2004          1          7        3.4        7.8    Firm C    Firm 0 |
             |-----------------------------------------------------------------------------------------|
         32. | 18   12        D   2003          2          8        6.7          1    Firm H    Firm 3 |
         33. | 19   12        D   2003          2          8        6.7          1    Firm L    Firm 3 |
             |-----------------------------------------------------------------------------------------|
         34. | 20   13        H   2002          4          8          7        2.3    Firm A    Firm 5 |
         35. | 20   13        H   2002          4          8          7        2.3    Firm A    Firm 9 |
         36. | 21   13        H   2002          4          8          7        2.3    Firm B    Firm 5 |
         37. | 21   13        H   2002          4          8          7        2.3    Firm B    Firm 9 |
         38. | 22   13        H   2002          4          8          7        2.3    Firm H    Firm 5 |
         39. | 22   13        H   2002          4          8          7        2.3    Firm H    Firm 9 |
         40. | 23   13        H   2002          4          8          7        2.3    Firm T    Firm 5 |
         41. | 23   13        H   2002          4          8          7        2.3    Firm T    Firm 9 |
         42. | 24   13        H   2002          4          8          7        2.3    Firm L    Firm 5 |
         43. | 24   13        H   2002          4          8          7        2.3    Firm L    Firm 9 |
         44. | 25   13        H   2002          4          8          7        2.3    Firm Z    Firm 5 |
         45. | 25   13        H   2002          4          8          7        2.3    Firm Z    Firm 9 |
             |-----------------------------------------------------------------------------------------|
         46. | 26   14        H   2003          5          9          6          4    Firm Q    Firm 1 |
             |-----------------------------------------------------------------------------------------|
         47. | 27   15        H   2006          5          3        .01          5    Firm W    Firm 5 |
         48. | 27   15        H   2006          5          3        .01          5    Firm W    Firm 9 |
             +-----------------------------------------------------------------------------------------+

        .

        Comment


        • #5
          It really helps Nick Cox . Thank you!

          Comment


          • #6
            Thank you Nick Cox for the code provided. I used it also for my data, and after a few changes to reflect my dataset it worked perfectly!

            Comment

            Working...
            X