Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining panel datasets with the same IDs and years to create advanced graphs

    Dear members,

    I have a panel dataset and I would like to append another dataset to it which has the same set of identifiers and years. I would like to merge it using Stata instead of manually entering it as the actual dataset is huge. It is essentially a duplicate which has different values for the x variables. The reason I want to do this is so that I can create xtline graphs to compare the two sets of data for the variable x, where the IDs are on the y axis and years on the x axis. For example, two red lines for ID 1, two green lines for ID 2.
    I have figured out how to plot separate graphs and put it side by side for comparison. But to enable better comparison, I would like to combine the datasets and plot them together on the same graph. Any help on how to combine the two datasets and obtain a single graph comparing them would be greatly appreciated.
    Example data

    I want to turn this:

    ​​​​​​clear
    input int(id year x_1)
    1 1992 160
    1 1993 120
    1 1994 179
    1 1995 78
    1 1996 121
    2 1992 140
    2 1993 187
    2 1994 230
    2 1995 308
    2 1996 80
    3 1992 76
    3 1993 200
    3 1994 190
    3 1995 120
    3 1996 82
    end

    into this:

    ​​​​​​clear
    input int(id year x_1)
    1 1992 160
    1 1992 180
    1 1993 120
    1 1993 123
    1 1994 179
    1 1994 180
    1 1995 78
    1 1995 90
    1 1996 121
    1 1996 130
    2 1992 140
    2 1992 180
    2 1993 187
    2 1993 198
    2 1994 230
    2 1994 160
    2 1995 308
    2 1995 210
    2 1996 80
    2 1996 120
    3 1992 76
    3 1992 134
    3 1993 200
    3 1993 208
    3 1994 190
    3 1994 209
    3 1995 120
    3 1995 170
    3 1996 82
    3 1996 120
    end
    Last edited by Lily Ksh; 07 Apr 2022, 23:35.

  • #2
    So, I imagine you start with two datasets that look something like this:
    Code:
    clear*
    input int(id year x_1)
    1 1992 160
    1 1993 120
    1 1994 179
    1 1996 121
    2 1992 140
    2 1993 187
    2 1994 230
    2 1995 308
    2 1996 80
    3 1992 76
    3 1993 200
    3 1994 190
    3 1995 120
    3 1996 82
    end
    tempfile dataset1
    save `dataset1'
    
    input int(id year x_1)
    1 1992 180
    1 1993 123
    1 1995 180
    1 1998 190
    1 1996 130
    2 1992 180
    2 1993 198
    2 1994 160
    2 1995 210
    2 1996 120
    3 1992 134
    3 1993 208
    3 1994 209
    3 1995 170
    3 1996 120
    end
    
    
    tempfile dataset2
    save `dataset2'
    I honestly can't say I understand the description of the graph you want to create. I can't visualize anything that meets that description that makes sense. I'm not sure what I'm missing.

    But here's how I would combine these sets to enable a comparison of the two x's over the years in each id.
    Code:
    clear
    use `dataset1'
    rename x_1 x_1_1
    merge 1:1 id year using `dataset2'
    rename x_1 x_1_2
    
    xtset id year
    
    xtline x_1_*
    Last edited by Clyde Schechter; 07 Apr 2022, 23:41.

    Comment


    • #3
      Sorry, Clyde. My question was not very clear. How do I get an overlayed line plot of x_1_1 and x_1_2 over the years with legends specifying which ID and dataset it belongs to? I would to specify different colours for the 6 lines/plots too.
      Last edited by Lily Ksh; 08 Apr 2022, 00:56.

      Comment


      • #4
        Well, here's how you can do that
        Code:
        clear*
        input int(id year x_1)
        1 1992 160
        1 1993 120
        1 1994 179
        1 1996 121
        2 1992 140
        2 1993 187
        2 1994 230
        2 1995 308
        2 1996 80
        3 1992 76
        3 1993 200
        3 1994 190
        3 1995 120
        3 1996 82
        end
        tempfile dataset1
        save `dataset1'
        
        clear
        input int(id year x_1)
        1 1992 180
        1 1993 123
        1 1995 180
        1 1998 190
        1 1996 130
        2 1992 180
        2 1993 198
        2 1994 160
        2 1995 210
        2 1996 120
        3 1992 134
        3 1993 208
        3 1994 209
        3 1995 170
        3 1996 120
        end
        
        
        tempfile dataset2
        save `dataset2'
        
        clear
        use `dataset1'
        rename x_1 x_1_1_
        merge 1:1 id year using `dataset2', nogenerate
        rename x_1 x_1_2_
        reshape wide x*_, i(year) j(id)
        forvalues i = 1/3 {
            forvalues d = 1/2 {
                label var x_1_`d'_`i' "Dataset `d', ID `i'"
            }
        }
        
        graph twoway line x_1* year, sort
        That said, this sounds like a really bad idea. Even with just three IDs the graph is a spaghetti mess, and if the real data set contains more than that it will be more like a rat's nest.

        Comment

        Working...
        X