Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • importing of data using a text line as marker of column names row

    Dear Statlist
    I have output data from two different instruments, that I need to import to stata. They are in the same format, but one instrument output has 40 rows of text prior to the actual table data and the other has 42 rows of text prior to the table data. Both outputs have the line " [Results] " before the column names.

    * Signal Smoothing On = true
    * Stage/ Cycle where Analysis is performed = Stage 2, Step 2
    * User Name = CBV/KATL
    [Results]
    Well Well Position
    1 A1 false
    2 A2 false

    Can I import the files specifying that the data read should begin after the line [Results] ? and that the line after [Results] is the column names

    Thanks
    Marie

  • #2
    Welcome to Statalist.

    You will want to use the import delimited command and use the rowrange option to specify that your data start on row 42 and the varnames option to specify that variable names being on row 41. Here is an example using your data from post #1.
    Code:
    . type ~/Downloads/sample.txt
    * Signal Smoothing On = true
    * Stage/ Cycle where Analysis is performed = Stage 2, Step 2
    * User Name = CBV/KATL
    [Results]
    Well_a Well_b Position
    1 A1 false
    2 A2 false
    
    . import delimited ~/Downloads/sample.txt, rowrange(6) varnames(5) delimiters(" ")
    (3 vars, 2 obs)
    
    . list
    
         +----------------------------+
         | well_a   well_b   position |
         |----------------------------|
      1. |      1       A1      false |
      2. |      2       A2      false |
         +----------------------------+
    The output of help import delimited will give you more information about the import delimited command.

    Comment


    • #3
      Thank you for the answer.

      My issue is, that I run a script, that imports and append all files of the given type.

      !dir "*Sample*.txt" /a-d/b >filelist.txt

      drop _all
      file open lister using filelist.txt, read
      file read lister line
      import delimited "`line'", varnames(1)
      gen plate="`line'"
      save sample_data.dta, replace

      drop _all

      file read lister line
      while r(eof)==0 {
      import delimited "`line'", varnames(1)
      gen plate="`line'"
      append using sample_data.dta, force
      save sample_data.dta, replace
      drop _all
      file read lister line
      }


      file close lister

      As the rowrange and varname line differ between the outputfiles and will be 42 for half the files and 40 for the other half I am not able to use the rowrange(line number)/ varnames(linenumber) command. Is there a way to define varnames as row below the [Results] line ?

      Comment


      • #4
        As the rowrange and varname line differ between the outputfiles and will be 42 for half the files and 40 for the other half I am not able to use the rowrange(line number)/ varnames(linenumber) command. Is there a way to define varnames as row below the [Results] line ?
        You need to be able to somehow read the data before the import. The following uses infix and is not tested.

        Code:
        infix str data 1-200000 using filelist.txt
        gen line= regexm(data,"[Results]")
        local line= cond(line==1 & _n==40, 41, 43)
        clear
        import delimited filelist.txt, rowrange(`=`line'+1') varnames(`=`line'') delimiters(" ")

        Comment


        • #5
          Is there a way to define varnames as row below the [Results] line?
          the following will strip off the first lines, save the remaining lines to a temporary file to be imported
          Code:
          local fn = "sample.txt"
          tempfile s
          qui di filewrite("`s'", ustrregexrf(fileread("`fn'"), "(?s).*\[Results\]", "" ))
          import delimited "`s'" , delim(" ") varnames(1)

          Comment

          Working...
          X