Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging yearly with repeated quarterly data

    Hey guys! I'm Dennis Stap from the Netherlands, and have quite the issue momentarily.

    My main dataset is composed of annual data and (having removed most other variables for simplicity) looks like this:

    total_curr year cusip
    265.849 2006 00036020

    However, my secondary database is composed of yearly data, which is then split into quarters on a manager level:

    cusip mgrno quarter Year PCTQ1 PCTQ2
    00036020 26600 1 2006 0,10 -
    00036020 10681 2 2006 - 0,20

    It may look a little weird, but the main idea was that I computed totals per company(cusip) in quarter x. These totals are computed by summing all the manager (mgrno) level data for cusip x, in year x, in quarter x. For simplicity sake, PCTQ1 and PCTQ2 are the variables I need, and are the same in any observation in year x and quarter x. I tried to make this clear by showing that PCTQ1 is not filled in quarter 2, and PCTQ2 is not filled in quarter 1.

    My end goal would be that my main dataset would look like this:

    total_curr year cusip PCTQ1 PCTQ2
    265.849 2006 00036020 0,10 0,20

    I tried merging the datasets, but then quarters started appearing in my main dataset. I think I have to reduce my secondary dataset to look like this, but I am not really sure at this moment:

    year cusip PCTQ1 PCTQ2
    2006 00036020 0,10 0,20


    If anybody has an idea how to tackle this problem, I would be very grateful! Thanks in advance

  • #2
    What you seem to need is to collapse your "secondary database", by doing:
    Code:
    collapse (sum) pctq1 pctq2, by(cusip year)
    NB: save this dataset separately, as you might be throwing away a lot of info with your collapse step.

    After the collapse, save, and merge like you tried before. Should have single observations per cusip and year now.

    Also, in future posts, please use dataex for posting example data. Makes life a lot easier for people trying to answer your questions, and improves the chances they will take the effort. Read more about this in the FAQ: https://www.statalist.org/forums/help#stata

    Example:
    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long cusip int mgrno byte quarter int year double(pctq1 pctq2)
    36020 26600 1 2006 .1  .
    36020 10681 2 2006  . .2
    end
    ------------------ copy up to and including the previous line ------------------

    Comment


    • #3
      Originally posted by Jorrit Gosens View Post
      What you seem to need is to collapse your "secondary database", by doing:
      Code:
      collapse (sum) pctq1 pctq2, by(cusip year)
      NB: save this dataset separately, as you might be throwing away a lot of info with your collapse step.

      After the collapse, save, and merge like you tried before. Should have single observations per cusip and year now.

      Also, in future posts, please use dataex for posting example data. Makes life a lot easier for people trying to answer your questions, and improves the chances they will take the effort. Read more about this in the FAQ: https://www.statalist.org/forums/help#stata

      Example:
      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long cusip int mgrno byte quarter int year double(pctq1 pctq2)
      36020 26600 1 2006 .1 .
      36020 10681 2 2006 . .2
      end
      ------------------ copy up to and including the previous line ------------------
      Thank you very much for this answer! I was afraid that the collapse command would somehow sum up all the individual percentages, as in this case every observation with the same cusip/year/quarter has the same percentages like this(using the dataex command now!):

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str8 cusip float(Year quarter PCTA PCTT PCTTS
      "00036020" 1993 2  .00290175  .9906353          .
      "00036020" 1993 2  .00290175  .9906353          .
      "00036020" 1993 2  .00290175  .9906353          .
      "00036020" 1993 3  .00290175  .9906353          .
      "00036020" 1993 3  .00290175  .9906353          .
      "00036020" 1993 3  .00290175  .9906353          .
      "00036020" 1993 4  .00290175  .9906353          .
      "00036020" 1993 4  .00290175         . .009364664
      "00036020" 1994 1  .00819866         .   .4429643
      "00036020" 1994 1  .00819866 .55703574          .
      "00036020" 1994 1  .00819866         .   .4429643
      "00036020" 1994 1  .00819866 .55703574          .
      "00036020" 1994 1  .00819866         .   .4429643
      "00036020" 1994 1  .00819866         .   .4429643          .
      end
      I ended up omitting the (sum) after collapse, and it seems to have yielded the correct figures.

      Comment


      • #4
        There was actually not much reason for me to assume 'sum' was the best option here. Default for collapse is the mean, which works well with this example.

        Comment

        Working...
        X