Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crash after tsset for fund-data

    I am writing my bachelors thesis and i have to analyze monthly fund data for several funds during a given period. Because the date variable "overlaps" for different funds, I was told to use tsset fundid date to classify the data so I can use it for my regressions, analysis, etc.
    However, everytime I run this code, Stata just closes itself and the progress is lost. Why is this? And more importantly, how do I fix it?
    FYI, the Data has a total of ~270,000 rows with monthly fund returns.

    Thank you in advance!

  • #2
    Hard to say without a sight of some of the data. See FAQ Advice #12 and please give a data example.The requirement for tsset is that each distinct (identifier, date) pair occurs once within the dataset. If that's not true, Stata would normally issue an error message.

    Additionally,

    Code:
    duplicates report fundid date
    might be instructuve.

    Comment


    • #3
      Thank you for the fast reply! I think you might just helped me understand it. The date collumn actually has duplicates (I know that), however, when i try to use the duplicates command, STATA again crashes.
      I guess i have to rearange the data so i have the date in the rows and fundid as a collumn. For now it looks like this:

      Click image for larger version

Name:	Screenshot 2024-06-21 141247.png
Views:	1
Size:	24.7 KB
ID:	1756798


      However, Stata crashing, still confuses me. (btw I am completley new to Stata)

      Comment


      • #4
        No; as said, it is not duplicates on date that would be a problem; it is whether there are duplicates on date and identifier.

        Also, your data layout is fine on the evidence you show.

        At least one thing is quite wrong here but entirely understandable as a beginner error too.

        You have monthly data and need to work with monthly dates. Asking for tsset with a date-time variable implies that less than 1 part in 1 billion of your data is present. Lag 1 = 1 millisecond. That's what you asked for.

        Your identifier appears to be a string variable. If so, fix that with encode.

        Code:
        encode fundid, gen(fund)
        Map to monthly dates and try again.

        Code:
        gen mdate = mofd(dofc(date))
        format mdate %tm
        
        tsset fund mdate
        Beginner lesson: In Stata, rows are observations, columns are variables.
        Last edited by Nick Cox; 21 Jun 2024, 07:27.

        Comment


        • #5
          Thank you very much! I hope this will work. I assume, my PC has a problem of running such large data tho, as it just keeps on crashing without giving an error code whatsoever when running tsset or duplicates report.

          I will have to resolve this first i guess, before i am able to continue with the data.

          Comment


          • #6
            I can't help there. Stata technical support may be able to advise given a sight of your licence information. Or your IT support at your university may have a steer.

            Comment


            • #7
              I will try that. Again, Thank you!

              Comment

              Working...
              X