Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with expand lines using foreach

    Hi everyone.

    I am using Stata 14.0 on Mac 10.12.

    I working with a dataset of around 4000 entries. I want to create a loop which reads through a specific subset of observations and then using expand, produces n-1 amount of lines as the value presented in another variable. To be more precise, I have created a dummy variable called multi-year_duration_dum which assigns 1 to all entries which span more than a year so that they are easily identifiable. I have also created a variable called cal_year_duration which denotes the number of years an entry spans. Now, what I want to do is create as many copies of each multi year project (categorical variable multi_year_duration_dum) as the number of year it spans (cal_year_duration).

    Thank you so much in advance, and let me know if I have to provide further information.


    Code:
    * Example generated by -dataex-. 
    clear
    input float(start_year end_year cal_year_duration same_year_duration_dum multi_year_duration_dum)
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1975    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1976    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1977    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1978    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1979    . . 1 0
    1980    . . 1 0
    1980 1980 1 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    1980    . . 1 0
    end

  • #2
    Thanks for using the -dataex- command on your first posting. Nevertheless, I find it hard to understand what you want, and I suspect others who might help are having a similar experience. Beyond the difficulty of describing what you want in words, I'd note that while cal_year_duration is presumably a *variable* in your actual data, it's a *constant* in your example data; similarly, I'd note that multi_year_duration_dum is 0 for all the observations in your example data. I'd encourage you to post a different set of example data, one that illustrates some of the variation in your actual data. In addition, showing what you want the expanded data to look like would help. In doing this, I'd suggest making the smallest example you can that illustrates the variations in your data.

    By the way: I would guess that a loop is not necessary. What you want *may* be as simple as
    Code:
    expand cal_year_duration

    Comment


    • #3
      Dear Mike, thank you so much for your answer. You are right the example data I provided aren't really indicative of the dataset. I hope this helps more:
      . dataex start_year end_year cal_year_duration same_year_duration_dum multi_year_d
      > uration_dum, count (200)

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(start_year end_year cal_year_duration same_year_duration_dum multi_year_duration_dum)
      1983 1986 4 0 1
      1983 1987 5 0 1
      1983 1985 3 0 1
      1983 1984 2 0 1
      1983 1986 4 0 1
      1983 1985 3 0 1
      1983 1986 4 0 1
      1983 1987 5 0 1
      1983 1984 2 0 1
      1983 1984 2 0 1
      1983 1985 3 0 1
      1983 1986 4 0 1
      1983 1986 4 0 1
      1983 1987 5 0 1
      1983 1984 2 0 1
      1983 1987 5 0 1
      1983 1984 2 0 1
      1983 1984 2 0 1
      1983    . . 1 0
      1983    . . 1 0
      1983 1985 3 0 1
      1983 1987 5 0 1
      1983 1987 5 0 1
      1984 1988 5 0 1
      1984 1986 3 0 1
      1984 1988 5 0 1
      1984 1986 3 0 1
      1984 1986 3 0 1
      1984 1989 6 0 1
      1984 1990 7 0 1
      1984 1989 6 0 1
      1984 1985 2 0 1
      1984 1986 3 0 1
      1984 1990 7 0 1
      1984 1988 5 0 1
      1984 1988 5 0 1
      1984 1989 6 0 1
      1984 1988 5 0 1
      1984 1988 5 0 1
      1984 1987 4 0 1
      1984 1989 6 0 1
      1984 1987 4 0 1
      1985 1987 3 0 1
      1985 1988 4 0 1
      1985 1990 6 0 1
      1985 1988 4 0 1
      1985 1992 8 0 1
      1985 1987 3 0 1
      1985 1987 3 0 1
      1985 1990 6 0 1
      1985 1986 2 0 1
      1985 1988 4 0 1
      1985 1988 4 0 1
      end
      ------------------ copy up to and including the previous line ------------------

      Listed 200 out of 4014 observations

      .

      Comment


      • #4
        If you mean extra observations should be created corresponding to cal_year_duration, then Mike's code should do it. If not, explain how it fails.

        Last edited by Andrew Musau; 17 Dec 2021, 04:12.

        Comment


        • #5
          Dear Andrew, thank you for your response. What I actually meant is that for all projects where multi_year_duration_dum= 1, I want to create the number of lines that cal_year_duration includes. In this way, multi_year_duration =1 is only there as an identifier of all multi year projects, and then the actual number of lines I want to add is the number under cal_year_duration. In the first observation, for example cal_year_duration == 4 and so I want 4 lines added, while in the second cal_year_duration == 5, so I want 5 rows added. I hope this helps, thank you.

          Comment


          • #6
            Ok, slight modification to #2

            Code:
            expand cal_year_duration+1

            Comment


            • #7
              That was what I had in mind too, but then my question is, wouldn't that expand all the lines in the cal_year_duration variable? I only want to expand those lines that are essentially more than 1.

              Would the code then be
              expand if cal_year_duration >1 cal_year_duration +1

              Comment


              • #8
                In your example, cal_year_duration is missing if multi_year_duration_dum=0. So these observations are ignored and no expansion occurs. If that is not true generally, then you can do:

                Code:
                replace cal_year_duration= . if !multi_year_duration_dum
                expand cal_year_duration+1
                or along your lines

                Code:
                expand cal_year_duration+1 if  multi_year_duration_dum

                Comment


                • #9
                  Dear Andrew, thank you again for your help. The code does work, but just to verify could you explain why the "+1" portion of the code works exactly> Just want to make sure I understand the process correctly. Thank you!

                  Comment


                  • #10
                    I would also like to ask if there is a way to mark only the cells resulting from the expansion, and not the original ones

                    Comment


                    • #11
                      Originally posted by Eleni Moschou View Post
                      Dear Andrew, thank you again for your help. The code does work, but just to verify could you explain why the "+1" portion of the code works exactly> Just want to make sure I understand the process correctly. Thank you!
                      The value that follows expand defines the total number of observations, including the original observation. So expand 2 creates 1 new observation, expand 3 creates 2 new observations, and so on. Since you wanted the the number of observations to be created to be equal to those defined by the variable "cal_year_duration", then following this logic, we need to add 1 to each entry of the variable.


                      I would also like to ask if there is a way to mark only the cells resulting from the expansion, and not the original ones
                      Yes, the option -generate()- does that. Also, if you do not have an observation identifier, create one. Below, new=1 defines a new observation and 0 the original.

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input float(start_year end_year cal_year_duration same_year_duration_dum multi_year_duration_dum)
                      1983 1986 4 0 1
                      1983 1987 5 0 1
                      1983 1985 3 0 1
                      1983 1984 2 0 1
                      1983 1986 4 0 1
                      1983 1985 3 0 1
                      1983 1986 4 0 1
                      1983 1987 5 0 1
                      end
                      
                      gen obs_no=_n
                      expand cal_year_duration+1, gen(new)
                      sort obs_no new
                      l, sepby(obs_no)
                      Res.:

                      Code:
                      . l, sepby(obs_no)
                      
                           +---------------------------------------------------------------------+
                           | start_~r   end_year   cal_ye~n   same_y~m   multi_~m   obs_no   new |
                           |---------------------------------------------------------------------|
                        1. |     1983       1986          4          0          1        1     0 |
                        2. |     1983       1986          4          0          1        1     1 |
                        3. |     1983       1986          4          0          1        1     1 |
                        4. |     1983       1986          4          0          1        1     1 |
                        5. |     1983       1986          4          0          1        1     1 |
                           |---------------------------------------------------------------------|
                        6. |     1983       1987          5          0          1        2     0 |
                        7. |     1983       1987          5          0          1        2     1 |
                        8. |     1983       1987          5          0          1        2     1 |
                        9. |     1983       1987          5          0          1        2     1 |
                       10. |     1983       1987          5          0          1        2     1 |
                       11. |     1983       1987          5          0          1        2     1 |
                           |---------------------------------------------------------------------|
                       12. |     1983       1985          3          0          1        3     0 |
                       13. |     1983       1985          3          0          1        3     1 |
                       14. |     1983       1985          3          0          1        3     1 |
                       15. |     1983       1985          3          0          1        3     1 |
                           |---------------------------------------------------------------------|
                       16. |     1983       1984          2          0          1        4     0 |
                       17. |     1983       1984          2          0          1        4     1 |
                       18. |     1983       1984          2          0          1        4     1 |
                           |---------------------------------------------------------------------|
                       19. |     1983       1986          4          0          1        5     0 |
                       20. |     1983       1986          4          0          1        5     1 |
                       21. |     1983       1986          4          0          1        5     1 |
                       22. |     1983       1986          4          0          1        5     1 |
                       23. |     1983       1986          4          0          1        5     1 |
                           |---------------------------------------------------------------------|
                       24. |     1983       1985          3          0          1        6     0 |
                       25. |     1983       1985          3          0          1        6     1 |
                       26. |     1983       1985          3          0          1        6     1 |
                       27. |     1983       1985          3          0          1        6     1 |
                           |---------------------------------------------------------------------|
                       28. |     1983       1986          4          0          1        7     0 |
                       29. |     1983       1986          4          0          1        7     1 |
                       30. |     1983       1986          4          0          1        7     1 |
                       31. |     1983       1986          4          0          1        7     1 |
                       32. |     1983       1986          4          0          1        7     1 |
                           |---------------------------------------------------------------------|
                       33. |     1983       1987          5          0          1        8     0 |
                       34. |     1983       1987          5          0          1        8     1 |
                       35. |     1983       1987          5          0          1        8     1 |
                       36. |     1983       1987          5          0          1        8     1 |
                       37. |     1983       1987          5          0          1        8     1 |
                       38. |     1983       1987          5          0          1        8     1 |
                           +---------------------------------------------------------------------+
                      
                      .
                      Last edited by Andrew Musau; 17 Dec 2021, 07:24.

                      Comment


                      • #12
                        That worked, thank you for all the help!

                        Comment

                        Working...
                        X