Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    There is no option to end at the next-to-last line. I am working on modifying the code I gave you yet another time, but I have been occupied with my own research today.

    Comment


    • #17
      The following expects your text files to meet the following requirements, which are based on the 2 files of your 18,000 files that you share.
      1. The first line has a "#" in column 1 followed by the variable names
      2. The last line is to be ignored.
      3. All other lines contain
        1. a space in the first column
        2. as many data items as there are variables, separated by spaces
        3. no data item has an embedded space, or if it does, the item is enclosed in quotation marks.
      Code:
      global location ~/Downloads/2files/
      
      cd "$location"
      clear
      save out.dta, replace emptyok
      
      tempfile lines
      
      capture frame drop files
      frame create files
      frame files {
          filelist, pattern(*_out.txt) directory("$location") norecursive
          local nfiles = c(N)
          list in 1/2
      }
      
      forvalues i = 1/`nfiles' {
          frame files: local file = dirname[`i']+filename[`i']
          local file : subinstr local file ".txt" ""
          display "file `i': `file'
          quietly {
              infix str line 1-100 using "`file'.txt", clear
              drop in l
              replace line = trim(substr(line,2,.))
              outfile using "`lines'", noquote replace
              import delimited "`lines'", delimiter(whitespace, collapse) varnames(1) clear
              save "`file'.dta", replace
              append using out.dta
              save out.dta, replace
          }
      }
      
      frame drop files
      describe
      Code:
      . global location ~/Downloads/2files/
      
      .
      . cd "$location"
      /Users/lisowskiw/Downloads/2files
      
      . clear
      
      . save out.dta, replace emptyok
      (dataset contains 0 observations)
      file out.dta saved
      
      .
      . tempfile lines
      
      .
      . capture frame drop files
      
      . frame create files
      
      . frame files {
      .     filelist, pattern(*_out.txt) directory("$location") norecursive
      Number of files found = 2
      .     local nfiles = c(N)
      .     list in 1/2
      
           +-------------------------------------------------------------------+
           | dirname               filename                              fsize |
           |-------------------------------------------------------------------|
        1. | ~/Downloads/2files/   579_33.51330265_73.90576445_out.txt   1,034 |
        2. | ~/Downloads/2files/   580_33.514591_73.75728486_out.txt     1,034 |
           +-------------------------------------------------------------------+
      . }
      
      .
      . forvalues i = 1/`nfiles' {
        2.     frame files: local file = dirname[`i']+filename[`i']
        3.     local file : subinstr local file ".txt" ""
        4.     display "file `i': `file'
        5.     quietly {
        6.         infix str line 1-100 using "`file'.txt", clear
        7.         drop in l
        8.         replace line = trim(substr(line,2,.))
        9.         outfile using "`lines'", noquote replace
       10.         import delimited "`lines'", delimiter(whitespace, collapse) varnames(1) clear
       11.         save "`file'.dta", replace
       12.         append using out.dta
       13.         save out.dta, replace
       14.     }
       15. }
      file 1: ~/Downloads/2files/579_33.51330265_73.90576445_out
      file 2: ~/Downloads/2files/580_33.514591_73.75728486_out
      
      .
      . frame drop files
      
      . describe
      
      Contains data from out.dta
       Observations:            48                  
          Variables:             4                  13 Dec 2022 18:14
      ------------------------------------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      ------------------------------------------------------------------------------------------------
      date            str9    %9s                  
      lon             float   %9.0g                
      lat             float   %9.0g                
      value           float   %9.0g                
      ------------------------------------------------------------------------------------------------
      Sorted by:
      
      .
      Last edited by William Lisowski; 13 Dec 2022, 16:26.

      Comment


      • #18
        William Lisowski Thanks

        Comment


        • #19
          Hi @William Lisowski

          is it possible to take the values of the lat and longitude from the file name and make a variables for lat and longitude from the file name. the values of the lat and long variables in the file are trimmed.

          Attached Files

          Comment


          • #20
            The reason latitude and longitude on the files is trimmed is because that is the limit of the accuracy of the system you extracted your data from.

            1 degree of latitude at the equator is about 111,000 meters. The latitudes and longitudes are reported to four digits to the right of the decimal point. So .0001 of a degree is about 11.1000 meters.

            You fed into their system latitudes and longitudes with eight digits to the right of the decimal point. So .00000001 of a degree is about .00111000 meters - a millimeter. I doubt that your data was that accurate.

            Comment


            • #21
              Thanks @William Lisowski I will look into it.

              Actually the round off the lat and lon are creating issues to merge the data. what changes do I have to make so that the lat and lon variables are not round off and are imported into stata will full precsion in the following files.

              Attached Files

              Comment


              • #22
                The following expects your text files to meet the following requirements, which are based on the 2 files of your 18,000 files that you share.
                1. The first line has a "#" in column 1 followed by the variable names
                2. The last line is to be ignored.
                3. All other lines contain
                  1. not necessarily a space in the first colum
                  2. as many data items as there are variables, separated by spaces
                  3. no data item has an embedded space, or if it does, the item is enclosed in quotation marks
                The key is to import your numeric values as double rather than float.
                Code:
                global location ~/Downloads/2filesv2/
                
                cd "$location"
                clear
                save out.dta, replace emptyok
                
                tempfile lines
                
                capture frame drop files
                frame create files
                frame files {
                    filelist, pattern(*_out.txt) directory("$location") norecursive
                    local nfiles = c(N)
                    list in 1/2
                }
                
                forvalues i = 1/`nfiles' {
                    frame files: local file = dirname[`i']+filename[`i']
                    local file : subinstr local file ".txt" ""
                    display "file `i': `file'
                    quietly {
                        infix str line 1-200 using "`file'.txt", clear
                        drop in l
                        replace line = substr(line,2,.) in 1
                        replace line = trim(line)
                        outfile using "`lines'", noquote replace
                        list in 1/2
                        import delimited "`lines'", delimiter(whitespace, collapse) varnames(1) clear asdouble
                        save "`file'.dta", replace
                        append using out.dta
                        save out.dta, replace
                    }
                }
                
                frame drop files
                describe
                format lat lon %16.8f
                list lat lon in 1, clean
                Code:
                . global location ~/Downloads/2filesv2/
                
                . 
                . cd "$location"
                /Users/lisowskiw/Downloads/2filesv2
                
                . clear
                
                . save out.dta, replace emptyok
                (dataset contains 0 observations)
                file out.dta saved
                
                . 
                . tempfile lines
                
                . 
                . capture frame drop files
                
                . frame create files
                
                . frame files {
                .     filelist, pattern(*_out.txt) directory("$location") norecursive
                Number of files found = 2
                .     local nfiles = c(N)
                .     list in 1/2
                
                     +------------------------------------------------------------------------+
                     | dirname                 filename                                 fsize |
                     |------------------------------------------------------------------------|
                  1. | ~/Downloads/2filesv2/   1_2003_36.44991823_72.57155787_out.txt   6,244 |
                  2. | ~/Downloads/2filesv2/   1_2004_36.44991823_72.57155787_out.txt   6,240 |
                     +------------------------------------------------------------------------+
                . }
                
                . 
                . forvalues i = 1/`nfiles' {
                  2.     frame files: local file = dirname[`i']+filename[`i']
                  3.     local file : subinstr local file ".txt" ""
                  4.     display "file `i': `file'
                  5.     quietly {
                  6.         infix str line 1-200 using "`file'.txt", clear
                  7.         drop in l
                  8.         replace line = substr(line,2,.) in 1
                  9.         replace line = trim(line)
                 10.         outfile using "`lines'", noquote replace
                 11.         list in 1/2
                 12.         import delimited "`lines'", delimiter(whitespace, collapse) varnames(1) clear asdou
                > ble
                 13.         save "`file'.dta", replace
                 14.         append using out.dta
                 15.         save out.dta, replace
                 16.     }
                 17. }
                file 1: ~/Downloads/2filesv2/1_2003_36.44991823_72.57155787_out
                file 2: ~/Downloads/2filesv2/1_2004_36.44991823_72.57155787_out
                
                . 
                . frame drop files
                
                . describe
                
                Contains data from out.dta
                 Observations:           142                  
                    Variables:            12                  15 Dec 2022 14:20
                ------------------------------------------------------------------------------------------------
                Variable      Storage   Display    Value
                    name         type    format    label      Variable label
                ------------------------------------------------------------------------------------------------
                date            str10   %10s                  
                time            str8    %9s                   
                year            int     %8.0g                 
                month           byte    %8.0g                 
                day             byte    %8.0g                 
                x               double  %10.0g                
                y               double  %10.0g                
                lon             double  %10.0g                
                lat             double  %10.0g                
                lev             byte    %8.0g                 
                name            str4    %9s                   
                value           double  %10.0g                
                ------------------------------------------------------------------------------------------------
                Sorted by: 
                
                . format lat lon %16.8f
                
                . list lat lon in 1, clean
                
                               lat           lon  
                  1.   36.44991823   72.57155787  
                
                .

                Comment


                • #23
                  Thanks @

                  The code misses some files; please see the image. There are a total 10,659 files but the code is only importing 10,000 files




                  Attached Files

                  Comment


                  • #24
                    The output of help filelist tells us
                    Code:
                    Note however that filelist is written in Mata and unfortunately
                    the dir() function can only return 10,000 filenames from a single directory.
                    So you should do this task in two parts, by
                    • moving all the files beginning 60 or later to a second directory
                    • first use your current code to append the files remaining in the current directory
                    • then change the $location in your code and use it to append the files in the second directory
                    • then append the two files just created

                    Comment


                    • #25
                      Hi William Lisowski

                      Thanks

                      is there a way to import unlimited files, as my number of files is getting large than 60,000

                      Comment


                      • #26
                        Not that I know of.

                        Comment


                        • #27
                          Deleted response; I missed the second page of the thread

                          Comment


                          • #28
                            Muhammad Ramzan instead of trying to list all the files in the directory using a single call to filelist, I would suggest identifying patterns in the filenames, and then using loops. For instance, you might do

                            Code:
                            forvalues i = 1/70 {
                                forvalues j = 2003/2020 {
                                  filelist , pattern(`i'_`j'_*_out.txt) directory("$location") norecursive
                                  ...
                                }
                            }

                            Comment


                            • #29
                              William Lisowski

                              Sir last time you provide me this code to import the data from the text files it it please possible to have an additional column on the stata , first number from the file name like
                              file 1: ~/Downloads/2filesv2/1_2003_36.44991823_72.57155787_out file 2: ~/Downloads/2filesv2/1_2004_36.44991823_72.57155787_out
                              I want a column in stata 1 if the data is coming from the first file, 2 if the data is coming from the second file please

                              Thanks
                              clear
                              set trace off
                              set more off
                              global location "H:\netcdf\temperature\2001"


                              cd "$location"
                              clear
                              save out.dta, replace emptyok

                              tempfile lines

                              capture frame drop files
                              frame create files
                              frame files {
                              filelist, pattern(*_out.txt) directory("$location") norecursive
                              local nfiles = c(N)
                              list in 1/2
                              }

                              forvalues i = 1/`nfiles' {
                              frame files: local file = dirname[`i']+filename[`i']
                              local file : subinstr local file ".txt" ""
                              display "file `i': `file'
                              quietly {
                              infix str line 1-200 using "`file'.txt", clear
                              replace line = substr(line,2,.) in 1
                              replace line = trim(line)
                              outfile using "`lines'", noquote replace
                              list in 1/2
                              import delimited "`lines'", delimiter(whitespace, collapse) varnames(1) clear asdouble
                              save "`file'.dta", replace
                              append using out.dta
                              save out.dta, replace
                              }
                              }

                              Comment

                              Working...
                              X