Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can someone append for me?

    I am trying to append this dataset but am stumbling into so many issues. At first I tried using this code and it did not work

    Code:
    xls2dta , clear generate (newvar1): append using "/Users/brady/Desktop/TN"
    variable A is str11 in master but byte in using data
        You could specify append's force option to ignore this string/numeric mismatch.  The using variable
        would then be treated as if it contained "".
    Then I tried a loop and that did not work either, it just made the dataset all glitchy and unusable (see image)

    Code:
    local filenames: dir "." files "*.xls"
    drop _all
    tempfile building
    quietly save `building', emptyok
    foreach f of local filenames {
    import delim using `"`f'"', clear
    gen source_file = `"`f'"'
    append using `building'
    save `"`building'"', replace
    }
    
    variable v117 is byte in master but str2 in using data
        You could specify append's force option to ignore this numeric/string mismatch.  The using variable
        would then be treated as if it contained numeric missing value.
    
    Click image for larger version

Name:	Screen Shot 2020-02-28 at 5.42.40 PM.png
Views:	1
Size:	41.8 KB
ID:	1538993
    At this point I am getting too frustrated so will take a pause. But if anyone can show me how to do this (I have attached a zipfile with dummy numbers) please let me know.

  • #2
    There was no zip file attached. But it wouldn't help anyway.

    Your loop is wrong, because you cannot use -import delimited- to read Excel files. -import delimited- is for delimited text files only. So for that loop to work, you have to change -import delimtied- to -import excel-. But your corrected loop will give you the same error message you are getting with -xls2dta-. The real problem is that your data sets have incompatibilities among themselves, and you have to fix and conform them before you can put them together. This is an extremely common problem that arises when trying to mass import and combine a bunch of files, particularly Excel files.

    My advice is to use -xls2dta- to save each spreadsheet as a separate Stata .dta file first. Then take a look at variable A (The one mentioned in the error message) in one of the files and determine whether, based on its content, it should be string or numeric. Then go through all the files and use -destring- or -tostring- (as the case may be) to fix it in those files where it is of the wrong type.

    Most commonly, with Excel files, this problem arises when what should be a numeric variable has gotten contaminated with non-numeric content. For example, somebody may have put N/A (or something like that) in a cell where the numeric value was missing. But N/A is string text, not numeric, so Stata imports the variable as a string. In this situation -destring- will not work immediately: it will complain that there is non-numeric content. Also, a blank cell in the first row of a spreadsheet column may cause that column to be read in as a string by -import excel-. Or it may be a mistyped number (e.g. 2.5 mistyped as 2..5). So you have to find the offending values:

    Code:
    tab A if missing(real(A))
    will show you the values of A that -destring- cannot accept. Then you can either -replace- them with appropriate numbers, or, if it is always a matter of N/A or something like that, you can use -destring-'s -ignore()- option to tell Stata to skip over the N/A's. Read -help destring- so you know exactly how to do this.

    I suggest that you then take a look at all the other variables in the data sets, using -describe- to see if there are other variables that have inconsistencies of this nature. It is likely you will find some. They, too, will need to be fixed in the same way.

    Once you have fixed up your .dta data sets, you can then write a new loop to append them all together.
    Last edited by Clyde Schechter; 28 Feb 2020, 16:44.

    Comment


    • #3
      In addition to Clyde's excellent advice, I would add that in order to use a loop in this context, you need to have basic consistency in the data-management practices that created your source datasets. By their essential nature, workbooks (Microsoft Excel included) don't promote such practices, and you'd be well advised to avoid them like the plague for any work that matters.

      I don't know which worksheet you're trying to collect, but across workbooks, you don't have consistent existence of worksheets, don't have consistent names for the worksheets that do exist, don't have consistent data structure in the worksheet named "Data" (where such a worksheet exists), and don't have consistent datatype for columns with the same name in the "Data" worksheet.

      You seem to be using a Macintosh, but substitute your superdirectory's name in for "F:/" in the code below, if you want to run it.
      Code:
      version 16.1
      
      clear *
      
      local dir F:/TN/
      local workbooks : dir "`dir'" files "*.xls"
      
      foreach workbook of local workbooks {
          display in smcl as result _newline(1) "`workbook'"
          quietly import excel "`dir'`workbook'", describe
          forvalues i = 1/`r(N_worksheet)' {
              display in smcl as text "`r(worksheet_`i')'"
          }
      }
      
      tempfile tmpfil0
      quietly save `tmpfil0', emptyok
      
      foreach workbook of local workbooks {
          capture noisily import excel "`dir'`workbook'", sheet(Data) firstrow allstring clear
          
          if !_rc {
              generate str workbook = "`workbook'"
              generate long row_nr = _n
      
              quietly {
                  append using `tmpfil0'
                  save `tmpfil0', replace
              }
          }
      }
      
      exit

      Comment

      Working...
      X