Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data from txt file

    Hi

    I have a large number of .txt files. I want to import them into the stata but it is not properly working please

    Attached Files

  • #2
    You're not new here, so you should know better than to post attachments. "It's not working" isn't good enough!!! It tells us nothing about the problem.

    Where did you get these files from? A zip file from a url? If so, give the link. If not, tell us what "does not working" means.

    Comment


    • #3
      I am getting these files from the NETCDF files, converting the NetCDF files into CSV format. when I try to import the data into the stata, getting the issues as attached in the image.

      and following is output of the dataex. I only have date lon lat value variables in the csv/txt file. some of the values from v5 variable are going into v6

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte v1 str10 v2 float(v3 v4 v5 v6) byte(date v8 v9 v10 lon v12 v13 v14 lat v16 v17 v18 value v20)
      . "2016-01-01" 73.7573 33.5146 288.6302       . . . . . . . . . . . . . . .
      . "2016-01-01" 73.7573 33.5146 289.9873       . . . . . . . . . . . . . . .
      . "2016-02-01" 73.7573 33.5146 291.4626       . . . . . . . . . . . . . . .
      . "2016-02-01" 73.7573 33.5146 292.8294       . . . . . . . . . . . . . . .
      . "2016-03-01" 73.7573 33.5146 294.2896       . . . . . . . . . . . . . . .
      . "2016-03-01" 73.7573 33.5146 295.6251       . . . . . . . . . . . . . . .
      . "2016-04-01" 73.7573 33.5146 298.6492       . . . . . . . . . . . . . . .
      . "2016-04-01" 73.7573 33.5146 300.1137       . . . . . . . . . . . . . . .
      . "2016-05-01" 73.7573 33.5146 303.5309       . . . . . . . . . . . . . . .
      . "2016-05-01" 73.7573 33.5146 305.0149       . . . . . . . . . . . . . . .
      . "2016-06-01" 73.7573 33.5146 304.4824       . . . . . . . . . . . . . . .
      . "2016-06-01" 73.7573 33.5146        . 305.941 . . . . . . . . . . . . . .
      . "2016-07-01" 73.7573 33.5146 301.8569       . . . . . . . . . . . . . . .
      . "2016-07-01" 73.7573 33.5146 303.2939       . . . . . . . . . . . . . . .
      . "2016-08-01" 73.7573 33.5146 301.7478       . . . . . . . . . . . . . . .
      . "2016-08-01" 73.7573 33.5146 303.1923       . . . . . . . . . . . . . . .
      . "2016-09-01" 73.7573 33.5146 301.0252       . . . . . . . . . . . . . . .
      . "2016-09-01" 73.7573 33.5146 302.4876       . . . . . . . . . . . . . . .
      . "2016-10-01" 73.7573 33.5146 300.2055       . . . . . . . . . . . . . . .
      . "2016-10-01" 73.7573 33.5146 301.6139       . . . . . . . . . . . . . . .
      . "2016-11-01" 73.7573 33.5146 296.2871       . . . . . . . . . . . . . . .
      . "2016-11-01" 73.7573 33.5146 297.6145       . . . . . . . . . . . . . . .
      . "2016-12-01" 73.7573 33.5146        . 293.532 . . . . . . . . . . . . . .
      . "2016-12-01" 73.7573 33.5146 294.8982       . . . . . . . . . . . . . . .
      end
      Attached Files

      Comment


      • #4
        Code:
        replace v5 = v6 if v5 == .

        Comment


        • #5
          Here is what one of the two text files looks like.
          Code:
          #      date    lon    lat    value 
           2016-01-01 73.7573 33.5146 288.6302 
           2016-01-01 73.7573 33.5146 289.9873 
           2016-02-01 73.7573 33.5146 291.4626 
           2016-02-01 73.7573 33.5146 292.8294 
           2016-03-01 73.7573 33.5146 294.2896 
           2016-03-01 73.7573 33.5146 295.6251 
           2016-04-01 73.7573 33.5146 298.6492 
           2016-04-01 73.7573 33.5146 300.1137 
           2016-05-01 73.7573 33.5146 303.5309 
           2016-05-01 73.7573 33.5146 305.0149 
           2016-06-01 73.7573 33.5146 304.4824 
           2016-06-01 73.7573 33.5146  305.941 
           2016-07-01 73.7573 33.5146 301.8569 
           2016-07-01 73.7573 33.5146 303.2939 
           2016-08-01 73.7573 33.5146 301.7478 
           2016-08-01 73.7573 33.5146 303.1923 
           2016-09-01 73.7573 33.5146 301.0252 
           2016-09-01 73.7573 33.5146 302.4876 
           2016-10-01 73.7573 33.5146 300.2055 
           2016-10-01 73.7573 33.5146 301.6139 
           2016-11-01 73.7573 33.5146 296.2871 
           2016-11-01 73.7573 33.5146 297.6145 
           2016-12-01 73.7573 33.5146  293.532 
           2016-12-01 73.7573 33.5146 294.8982 
          cdo    outputtab: Processed 24 values from 1 variable over 12 timesteps [0.01s 44MB].
          The other file is identical in layout. For files with this layout the following will work, and hopefully it will provide a starting point for files with a different layout.
          Code:
          import delimited "~/Downloads/580_33.514591_73.75728486_out.txt", ///
            delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2)
          // one extra variable is created because of the blank character at the end of each line
          keep (date-value)
          list, clean
          Code:
          . import delimited "~/Downloads/580_33.514591_73.75728486_out.txt", ///
          >   delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2)
          (encoding automatically selected: ISO-8859-1)
          (5 vars, 24 obs)
          
          . // one extra variable is created because of the blank character at the end of each line
          . keep (date-value)
          
          . list, clean 
          
                       date       lon       lat      value  
            1.   2016-01-01   73.7573   33.5146   288.6302  
            2.   2016-01-01   73.7573   33.5146   289.9873  
            3.   2016-02-01   73.7573   33.5146   291.4626  
            4.   2016-02-01   73.7573   33.5146   292.8294  
            5.   2016-03-01   73.7573   33.5146   294.2896  
            6.   2016-03-01   73.7573   33.5146   295.6251  
            7.   2016-04-01   73.7573   33.5146   298.6492  
            8.   2016-04-01   73.7573   33.5146   300.1137  
            9.   2016-05-01   73.7573   33.5146   303.5309  
           10.   2016-05-01   73.7573   33.5146   305.0149  
           11.   2016-06-01   73.7573   33.5146   304.4824  
           12.   2016-06-01   73.7573   33.5146    305.941  
           13.   2016-07-01   73.7573   33.5146   301.8569  
           14.   2016-07-01   73.7573   33.5146   303.2939  
           15.   2016-08-01   73.7573   33.5146   301.7478  
           16.   2016-08-01   73.7573   33.5146   303.1923  
           17.   2016-09-01   73.7573   33.5146   301.0252  
           18.   2016-09-01   73.7573   33.5146   302.4876  
           19.   2016-10-01   73.7573   33.5146   300.2055  
           20.   2016-10-01   73.7573   33.5146   301.6139  
           21.   2016-11-01   73.7573   33.5146   296.2871  
           22.   2016-11-01   73.7573   33.5146   297.6145  
           23.   2016-12-01   73.7573   33.5146    293.532  
           24.   2016-12-01   73.7573   33.5146   294.8982  
          
          .

          Comment


          • #6
            Thanks @William Lisowski
            I have a large number of files. the files are named 579_33.51330265_73.90576445_out.txt. clusterlocation_lat_longitude_variablename.txt.

            how I can call all the files in for each loop.



            Comment


            • #7
              How large is the number of files? Are they all in the same directory, or are they perhaps sorted into subdirectories?

              Comment


              • #8
                580 files

                I am using these commands it is working but when I add the append command at the end I get the error, I have to append the data from all the files into one file. and name it as variablename from clusterlocation_lat_longitude_variablename.txt.

                clear
                set trace on
                set more off
                global location "H:/netcdf/1"


                cd "$location"
                local f: dir "$location" files "*.txt"
                foreach file of local f{
                import delimited `file', delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2)
                keep (date-value)
                list, clean
                save `file'.dta,replace
                }

                Comment


                • #9
                  So it appears to me that your post #8 answers your question in post #6.

                  In post #8 you ask a new question about append but you don't show us the command(s) you are using. I will guess that the additions highlighted in red below will start you in a useful direction.

                  Code:
                  cd "$location"
                  clear
                  save out.dta, replace emptyok
                  local f: dir "$location" files "*.txt"
                  foreach file of local f{
                      import delimited `file', delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2)
                      keep (date-value)
                      list, clean
                      save `file'.dta,replace
                      append using out.dta
                      save out.dta, replace
                  }

                  Comment


                  • #10
                    Hi

                    I am trying to run the above code on my ALL files but getting the following error



                    . local f: dir "$location" files "*_out.txt"
                    too many filenames
                    r(134);

                    end of do-file

                    r(134);

                    how to solve this issue, please it worked fie when i tested it on few files




                    cd "$location"
                    clear
                    save out.dta, replace emptyok
                    local f: dir "$location" files "*_out.txt"
                    foreach file of local f{
                    import delimited `file', delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2) clear
                    keep (date-value)
                    list, clean
                    save `file'.dta,replace
                    append using out.dta
                    save out.dta, replace
                    erase `file'.dta
                    }

                    Comment


                    • #11
                      I do not get that error when I run this code that attempts to simulate what you have,
                      Code:
                      set obs 10
                      generate x = 42
                      forvalues f=1/580 {
                          quietly save ~/Downloads/580files/`f'_33.514591_73.75728486_out.dta, replace
                      }
                      
                      global location "~/Downloads/580files"
                      
                      local f: dir "$location" files "*_out.dta"
                      
                      macro list _f
                      So I ask again, in the directory given by the $location how many files are there that end in _out.txt? Because 580 does not seem to be enough to cause this sort of problem.

                      Comment


                      • #12
                        sorry 18000 files approx

                        Comment


                        • #13
                          Sigh.

                          Perhaps this example code will handle your 18000 files.
                          Code:
                          * run the following ssc command once to install 
                          *   the community contributed filelist command
                          
                          * ssc install filelist
                          
                          global location ~/Downloads/2files/
                          
                          cd "$location"
                          clear
                          save out.dta, replace emptyok
                          
                          capture frame drop files
                          frame create files
                          frame files {
                              filelist, pattern(*_out.txt) directory("$location") norecursive
                              local nfiles = c(N)
                              list in 1/2
                          }
                          
                          forvalues i = 1/`nfiles' {
                              frame files: local file = dirname[`i']+filename[`i']
                              local file : subinstr local file ".txt" ""
                              display "file `i': `file'
                              quietly {
                                  import delimited "`file'.txt", delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2) clear
                                  keep (date-value)
                                  save "`file'.dta", replace
                                  append using out.dta
                                  save out.dta, replace
                              }
                          }
                          
                          frame drop files
                          describe
                          Code:
                          . global location ~/Downloads/2files/
                          
                          . 
                          . cd "$location"
                          /Users/lisowskiw/Downloads/2files
                          
                          . clear
                          
                          . save out.dta, replace emptyok
                          (dataset contains 0 observations)
                          file out.dta saved
                          
                          . 
                          . capture frame drop files
                          
                          . frame create files
                          
                          . frame files {
                          .     filelist, pattern(*_out.txt) directory("$location") norecursive
                          Number of files found = 2
                          .     local nfiles = c(N)
                          .     list in 1/2
                          
                               +-------------------------------------------------------------------+
                               | dirname               filename                              fsize |
                               |-------------------------------------------------------------------|
                            1. | ~/Downloads/2files/   579_33.51330265_73.90576445_out.txt   1,034 |
                            2. | ~/Downloads/2files/   580_33.514591_73.75728486_out.txt     1,034 |
                               +-------------------------------------------------------------------+
                          . }
                          
                          . 
                          . forvalues i = 1/`nfiles' {
                            2.     frame files: local file = dirname[`i']+filename[`i']
                            3.     local file : subinstr local file ".txt" ""
                            4.     display "file `i': `file'
                            5.     quietly {
                            6.         import delimited "`file'.txt", delimiter(whitespace, collapse) varnames(1) rowrange
                          > (1:25) colrange(2) clear
                            7.         keep (date-value)
                            8.         save "`file'.dta", replace
                            9.         append using out.dta
                           10.         save out.dta, replace
                           11.     }
                           12. }
                          file 1: ~/Downloads/2files/579_33.51330265_73.90576445_out
                          file 2: ~/Downloads/2files/580_33.514591_73.75728486_out
                          
                          . 
                          . frame drop files
                          
                          . describe
                          
                          Contains data from out.dta
                           Observations:            48                  
                              Variables:             4                  12 Dec 2022 16:02
                          ------------------------------------------------------------------------------------------------
                          Variable      Storage   Display    Value
                              name         type    format    label      Variable label
                          ------------------------------------------------------------------------------------------------
                          date            str10   %10s                  
                          lon             float   %9.0g                 
                          lat             float   %9.0g                 
                          value           float   %9.0g                 
                          ------------------------------------------------------------------------------------------------
                          Sorted by: 
                          
                          .

                          Comment


                          • #14
                            Thanks

                            import delimited "`file'.txt", delimiter(whitespace, collapse) varnames(1) rowrange(1:25) colrange(2) clear

                            rather than fixing the rowrange to a specif number , how I can specify rowrange from 1 to the second last row, please



                            Comment


                            • #15
                              William Lisowski Thanks

                              I have another set of files where the rowrange is not the same for all the files; how I can import the data from those files the last row is a text line please

                              Comment

                              Working...
                              X