Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata CSV Import Issue: Unmatched Quote Error , maxquotedrows()

    Hi everyone,

    I'm running into an issue when importing a CSV file into Stata. I get the following error:
    Code:
     
     Unmatched quote exceeded 8 lines while processing row 1468; there may be a problem with your data or perhaps you have a quoted string with too many lines. You may specify maxquotedrows() to override the default behavior.
    Here’s my current syntax:
    Code:
    import delimited "filepath\filename.csv", delimiter(comma) bindquote(strict) varnames(1) stripquote(yes) encoding(UTF-8) maxquotedrows(8)

    I tried increasing maxquotedrows() as suggested, but the problem still persists — sometimes the file imports, sometimes it doesn't. I suspect the issue is due to improperly formatted quotes within certain cells.

    What I want to achieve:
    • Keep the comma (,) as the delimiter.
    • Completely ignore double quotes (") (even more if they are mismatched).
    Is there a way to force Stata to either truncate overly long rows or ignore problematic quotes altogether? Any alternative options or tweaks to the import command would be appreciated!

    Thanks in advance for your help!

  • #2
    Completely ignore double quotes (") (even more if they are mismatched).
    I think this is what the -bindquotes(nobind)- option of -import delimited- does.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I think this is what the -bindquotes(nobind)- option of -import delimited- does.
      Clyde Schechter Thank you for the advice — was exaclty what i looking for. With that option, the error i got is gone. However, unfortunately, I’m now facing a misalignment in the data columns (undestable but I was hoping commas were sufficient). Ideally, I’m looking for a combination of commands that would allow the import to bypass the error:

      "Unmatched quote exceeded 8 lines while processing row 1468; there may be a problem with your data or perhaps you have a quoted string with too many lines. You may specify maxquotedrows() to override the default behavior."

      In my previous import attempt, at least the data was structured correctly, but I can't quite figure out why it failed or how to handle it properly. Any suggestions?

      Comment


      • #4
        I'm not surprised this happened. If your data contains variables that are free-form text, the probability that there are real commas contained in some of those variables is high, and if you then ignore quotes, those variable-internal commas get treated as variable separators and the data becomes misaligned. I do a fair amount of work with delimited text files, and, luckily for me, there is a relatively small number of people who provide me with those. So I've been able to work out an understanding with them not to use commas as the delimiter, at least not if the file contains long text fields. I've never encountered this problem using | as the delimiter--that character just about never shows up in text, so quotes are not needed, and if present can be ignored without causing misalignment.

        This is a difficult situation. Clearly the input data are flawed. There is probably some string variable with an internal comma and a missing end quote, and everything spirals downhill from there.

        Here are a few things I would try, in no particular order.
        1. Contact the source of the data, explain the problem, and ask them to provide you with a repaired data set. If feasible, this is the easiest solution for you. The downside is that even assuming the source of the data can be contacted and is agreeable, it might be a low priority for them, and you might have to wait an unacceptably long time for their response.
        2. Since the original error message shows that things break around line 1468 with -maxquotedrows(8)-, the problem probably arises in line 1461. So using the -rowrange()- option, import lines 1 through 1460 and then separately lines 1462 through the end of the data, and append those results together. This still leaves out line 1461 itself, but perhaps that is a tolerable data loss. It is possible that the problem is not in line 1461, but in some line near that, so some experimentation might find a way to bring in all of the data except the bad line if you can isolate it.
        3. Open the .csv file in Excel: sometimes Excel can read these files correctly. Scroll down to rows around 1460 and see if that happens. If so, save the file as .xlsx and then bring it into Stata with -import excel-. Make sure you document this operation as part of your audit trail.
        4. If Excel doesn't read the file correctly either, you might, nevertheless, still be able to see from what Excel shows you, just what the problem is. It might even be "obvious" where the missing quote should be. In that case you could make a copy of the original .csv file and perform "surgery" on that line in the copy to correct the problem. Then use -import delimited- on the copy. Again, be sure to document this as part of your audit trail.
        Others may have additional ways of dealing with this problem, and, I hope, will chime in here.

        Comment


        • #5
          Thank you Clyde Schechter !!! These are really helpful insights, and I’ve learned a lot. It turns out Excel can read the files, and from there I moved them into Stata (which now has a structured import). Thanks to your explanation, I now understand the issue with commas in the text (which are indeed present and causing the misalignment).

          Unfortunately, I generated the data myself by reading an HTML data source with Python. I might try replacing the delimiters with | to see if that helps. I need to automate the process since I have many files, but if all else fails, going through Excel seems like a good fallback.

          Thanks again. Your comments were truly helpful, and those strategies can help me in the future!! and I hope others will join in with more ideas!

          Comment

          Working...
          X