Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loading CSV vs Excel files in to Stata - risks

    I have a large patient level covid-19 test dataset which is exceeding 400MBs. It is in excel format and loading it to Stata takes a long time. I was in fact timing it once and this took >2hours.

    Next, I tried saving the file in csv and loading it to Stata and it only took less than 2 minutes to load the dataset into Stata.

    My question is, what risks, in terms of data loss, exists when converting files to csv and loading into Stata? I have noticed date formats changing to strings, which is fixable. My concern is more about loosing information/changing quality of data.

    Appreciate any thoughts on this.


  • #2
    A csv file is a plain text file. What you will lose when you convert an excel file to a csv file depends on what kinds of things you have stored in the excel file. If this is just values which can be saved in plain text, you will lose nothing, but things like cell formatting (color) and underlying excel formulas will be lost.

    Comment


    • #3
      If you can safely export your Excel datasheet(s) to CSV, I would do it. Aside from faster load times into Stata, no special software is needed to open the file.

      Comment

      Working...
      X