Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stopwords Removal with Txttool

    I would like to remove stopwords from strings and I have received the advice to use txtool. However, it says unmatched quote when performing the command. The strings include texts of annual report files that were stored as strings in Stata by using Wordstat. I use Stata version 16. The text is transformed to lower case in the variable document_lc.

    I counted the overall amount of words with wordcount and now I want to create a variable that specifies the amount of words without stopwords.

    This is the command I used:

    txttool document_lc, generate(text_wo_stopwords_german) noclean nooutput stopwords("/Volumes/Elements//Stopwords/German stopwords.txt)

    Is it possible that the strings are too long? What might be a solution?

    Thank you

    Robert

  • #2
    I am not familiar with this command at all (as the FAQ requests, please tell us where you obtained community contributed commands), but the problem appears in your command: at the beginning of the parenthesized materials after "stopwords" you have a double quotation mark but you don't have one at the end

    Comment


    • #3
      Thank you.
      I have found the command by searching online for text analysis in Stata.
      Now, I receive the following error when performing the command, with quotation marks as well as without.

      st_addvar(): 3300 argument out of range
      mm_txttool(): - function returned error
      <istmt>: - function returned error

      I do not understand what I should change. Is the size of the string the problem?
      I have used 6 observations only to test the command.

      Comment


      • #4
        You will increase your chances of useful answer by following the FAQ on asking questions- by Stata code in code delimiters, readable Stata output, and sample data using dataex. Being able to replicate your problem is often essential to helping you.

        . What Rick was trying to get you to do was to tell us where you got this procedure from – I got it on the Internet is not helpful.

        With a user written procedures, help depends to some extent on whether someone active on the list actually uses that procedure. For technical problems like this, you either have to try to understand the program in detail and debug it yourself, or contact the programs authors

        Comment


        • #5
          Originally posted by Robert Adrian Piper View Post
          Thank you.
          I have found the command by searching online for text analysis in Stata.
          Now, I receive the following error when performing the command, with quotation marks as well as without.

          st_addvar(): 3300 argument out of range
          mm_txttool(): - function returned error
          <istmt>: - function returned error

          I do not understand what I should change. Is the size of the string the problem?
          I have used 6 observations only to test the command.
          Did you find a solution? I am facing a similar problem with txttool.

          Comment


          • #6
            Also following this thread because I have a similar issue.

            Comment


            • #7
              Robert Adrian Piper
              Carlo Koos
              Diana Hechavarria

              Apparently either of the commands
              Code:
              net install dm0077, from(http://www.stata-journal.com/software/sj14-4) // Stata Journal distribution
              ssc install txttool // SSC distribution
              installs version 1.1 of the txttool package. (This is often not the case, which is why the FAQ asks for specific information about the source of the code.)

              The error message reported in post #3 is not a message generated by txttool to be delivered to the txttool user to enable them to correct their use of txttool.

              Instead, it is a message generated by Mata's st_addvar() function directed to the txttool developer using Mata to create the mm_txttool() function for use by txttool. When a developer sees such a message, they change their program so that the problem does not recur, either by correcting an error of theirs or by better checking the input provided by the user that triggered the problem.

              The output of help txttool gives the developer's name and email address. Perhaps in response to an email from you they can locate the source of the problem and either provide a fix or a workaround.

              Comment


              • #8
                Thank you William Lisowski appreciate the insight! Will do.

                Comment

                Working...
                X