Question about data management with panels

Daniel Perez

Join Date: Nov 2014

Posts: 46
#1

Question about data management with panels

26 Feb 2020, 08:20

Hi all,

Sometimes I have to deal with panel data with several cross-section units (i=1..., I), time series (y,1...., Y), and variables (v1,.....vN).
When processing the data, it is common for me to consolidate the data (i.e., calculations that include grouping cross-section units for a specific time). I do it storing the results using scalars inside loops. If instead of scalars, I would use variables, the problem is that values of the calculations are repeated along the time-series dimension, instead of just one single value. However, when I have to retrieve the values from the scalars after closing the Stata session, I should rerun the commands because, as you know scalars are only for each session.

Here my question, do you have any advice on how to deal (what kind of storage option do you recommend to use) when you are going from one dimension of the panel to another and you need to keep the results for later?

Thank you very much.
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

26 Feb 2020, 09:15

You should store the results in variables rather than in scalars, and then use if clauses to subset the data appropriately for the calculations you do subsequently.

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input float(id wave x)

            id       wave          x
  1. 1 1  6
  2. 1 2  5
  3. 2 1  3
  4. 2 2  1
  5. 3 1  6
  6. 3 2  2
  7. 4 1  6
  8. 4 2  9
  9. 5 1  4
 10. 5 2 10
 11. end

. 
. egen x_id = sum(x), by(id)

. egen x_wave = sum(x), by(wave)

. 
. sort id

. list id wave x x_id, sepby(id)

     +-----------------------+
     | id   wave    x   x_id |
     |-----------------------|
  1. |  1      1    6     11 |
  2. |  1      2    5     11 |
     |-----------------------|
  3. |  2      1    3      4 |
  4. |  2      2    1      4 |
     |-----------------------|
  5. |  3      1    6      8 |
  6. |  3      2    2      8 |
     |-----------------------|
  7. |  4      1    6     15 |
  8. |  4      2    9     15 |
     |-----------------------|
  9. |  5      1    4     14 |
 10. |  5      2   10     14 |
     +-----------------------+

. sort wave

. list id wave x x_wave, sepby(wave)

     +-------------------------+
     | id   wave    x   x_wave |
     |-------------------------|
  1. |  3      1    6       25 |
  2. |  1      1    6       25 |
  3. |  4      1    6       25 |
  4. |  2      1    3       25 |
  5. |  5      1    4       25 |
     |-------------------------|
  6. |  3      2    2       27 |
  7. |  2      2    1       27 |
  8. |  4      2    9       27 |
  9. |  1      2    5       27 |
 10. |  5      2   10       27 |
     +-------------------------+

. 
. tab x_id if wave==1

       x_id |      Freq.     Percent        Cum.
------------+-----------------------------------
          4 |          1       20.00       20.00
          8 |          1       20.00       40.00
         11 |          1       20.00       60.00
         14 |          1       20.00       80.00
         15 |          1       20.00      100.00
------------+-----------------------------------
      Total |          5      100.00

. tab x_wave if id==1

     x_wave |      Freq.     Percent        Cum.
------------+-----------------------------------
         25 |          1       50.00       50.00
         27 |          1       50.00      100.00
------------+-----------------------------------
      Total |          2      100.00

.

Comment

Daniel Perez

Join Date: Nov 2014

Posts: 46
#3

26 Feb 2020, 13:01

Thank you William!
Comment

Announcement

Question about data management with panels

Comment

Comment