Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data management on flat data files

    Hi all-

    I need to merge, clean and label covariates on few flat data files from the CDC NCHD Vital Statistics dataset, specifically the 2022 period/2021 cohort. I converted them to txt format and manage to import them into Stata 18. However, the imported data appeared as one column with 3,669,929 observations. I need to label the covariates based on the data dictionary but need some advice on how to do this. Appreciate your advice, thanks!
    import delimited "/Users/yc765/Downloads/VS2021LINK.Public.USDENPUB_add_Seq_for2021Cohort_r 2024_04_15.txt", emptylines(include)
    > clear
    (encoding automatically selected: UTF-8)
    (1 var, 3,669,929 obs)

  • #2
    I try to use dataex to cut out a sample of the data as example to show here, but because the width of the column is too wide, I get an error message that the line size was exceeded...

    Comment


    • #3
      It looks like you will have to use infile with a dictionary based on the "File Layout" provided starting on page 22 of the PDF you link.

      If a Stata dictionary file for that particular dataset does not exist you may have to code it yourself.

      See

      https://www.stata.com/manuals/dinfil...le(fixedformat)

      It looks like there are some other useful guides and examples online such as

      https://www.stata.com/support/faqs/d...d-format-data/

      and

      https://www.stata.com/support/faqs/d...onary-options/


      It looks as though NBER created dictionary (.dct) files through 2011, those may be helpful if it turns out you need to write your own dictionary file:

      https://www.nber.org/research/data/l...th-cohort-data


      Last edited by Bert Lloyd; 16 May 2024, 12:49.

      Comment


      • #4
        Thank you Bert, I'll try your suggestion.

        Comment


        • #5
          Have you looked at whether this resource would work? https://www.nber.org/research/data/m...use-death-data
          Unfortunately it looks like they don't have the 2022 data yet?

          Comment


          • #6
            Erik, you are right. No 2022 data yet on NBER website, thanks though.

            Comment


            • #7
              Bert Lloyd Just an update- I've to create the dictionary using the infile command. The .dta file though needs to be changed to .raw in order for .dct to work with infile. Hope this will be helpful for someone else doing the same in the future. Thanks again.

              Comment

              Working...
              X