Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exporting observations to seperate .txt files

    Dear Statalist users,

    First of all thank you in advance for your time and help. For my thesis I have collected news articles and to process them I need to export them from Stata into .txt files.
    Using the command 'outfile' I am able to export all of the observations/articles into one enormous .txt file, but I cannot seem to find a way of exporting every observation into a different .txt file.
    My question then would be, what commands do I need to use for this? Or is it simply not possible using Stata?

    Again, thank you for your time.

    -Tom

  • #2
    Tom: please post some example lines from your original data, together with lines showing what you want to achieve. Your current description is confusing (to me)

    Comment


    • #3
      My aplogies for the confusion, I will give some more background information.
      As of now I merely have collected the data and have yet to do establish a do-file or something of the likes. As an example I have copied a line from my dataset below.
      The last entry is, as you can see, a news article. What I want to do is use another software package to determine whether the news article has a positive or negative tone.
      To be able to do this all these articles need to be in seperate .txt files, which should be possible by using the export function. My supervisor told me this was possible through Stata, but after searching for many hours I have yet to find a way to do it.

      Article CompanyTicker Companyname Source Title Dateofpublication WriterofArticle Fulltext
      1 005930 Samsung Electronics Co., Ltd. Wall Street Journal Advertising's Best and Worst of 2013 --- Industry.... 12/30/2013 Suzanne Vranica Advertising's Best and Worst of 2013 --- Industry Executives Share Views About Where Marketers and Their Agencies Went Right or Wrong

      By Suzanne Vranica
      1,493 words
      30 December 2013
      The Wall Street Journal
      J
      B1
      English
      (Copyright (c) 2013, Dow Jones & Company, Inc.)

      Even as digital technology makes placing ads more of a science than an art, creating a successful ad remains very much a hit-or-miss business.

      Samsung Electronics Co., for instance, has successfully taken on Apple Inc. in the smartphone business over the past few years, partly by matching the iPhone maker in the quality of its marketing. But this year a Samsung commercial was picked as one of the worst ads of the year in our annual survey of marketing executives, as were spots from Kmart and Mazda Motor Corp. One of the best ads of the year, on the other hand, came from a startup toymaker.

      Some of the top executives in the advertising business gave us the following picks for best and worst ads of 2013:

      etc.

      Comment


      • #4
        OK. I'm not sure Stata is the best tool for this kind of text processing. But, I think you can do what you want as follows:

        Code:
        count
        forvalues j = 1/`r(N)' {
            export delimited using article`j'.txt in `j', replace
        }
        This assumes that each article is a single observation in your data set. It then will export the information on each article into a separate text file whose name includes the observation number corresponding to the article.

        Also, look at the help for -export delimited- to see if you want to apply some of the other options available to that command that will affect the format of your output files.

        Hope this helps.

        Comment


        • #5
          I think that Stata 13 actually has interesting possibilities for text processing, and might make a good tool here. I'm thinking here of the fileread() and filewrite() features in particular. To know whether they are relevant, and to know whether using -export- as Clyde Schechter correctly suggests, will do the trick for you, I would want to know more:

          To wit: You don't indicate the kind of format your other software package expects. For example, does it want a literal text file, with end-of-line characters and all intact (i.e., as if you had just saved the text from some word or text processor)? Or, does it expect something with lines of text etc. delimited by tabs or commas or something else? Do you want the header material (Article, CompanyTicker, ...) included in each file? And, the structure of your Stata file with all the articles is still unclear to me: Is the full text of each article contained in a single long string variable called "Fulltext." If so, the means (import? fileread?) by which you input that full text into a Stata file would matter, as it would affect whether end of line characters are included in the Fulltext variable, which might or might not matter. Depending on the answer to these questions, and most particularly the format required by your destination software, -export- may or may not do the trick for you.

          Regards,
          Mike

          Comment


          • #6
            Thank you all very much for your help! Clyde's code worked like a charm.
            Mike, the other package does not require a specific format other than that it has to be in a .txt file. Simply the text should be enough! As the software is going to focus on the words used in the article to determine the tone I have opted to exclude the header material. The full text is indeed a single long string variable called "Fulltext".
            In my case it appears that it would not matter but you raised a good point as I might want to change the way of importing the data anyway so thank you for that.

            Comment


            • #7
              Tom, note that your article does not have to be just plaintext string. For example, it can easily be Word or PDF documents. See Chuck Huber (StataCorp) explains how to handle BLOBs in Stata here to process MRI images (as an example):

              and filewrite() function at 4:20 to extract particular patient data from the Stata dataset to a file on disk for external programs.

              Best, Sergiy Radyakin

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                OK. I'm not sure Stata is the best tool for this kind of text processing. But, I think you can do what you want as follows:

                Code:
                count
                forvalues j = 1/`r(N)' {
                export delimited using article`j'.txt in `j', replace
                }
                This assumes that each article is a single observation in your data set. It then will export the information on each article into a separate text file whose name includes the observation number corresponding to the article.

                Also, look at the help for -export delimited- to see if you want to apply some of the other options available to that command that will affect the format of your output files.

                Hope this helps.
                __________________________________________________ __

                Hello Clyde and other Statalisters,
                I have been searching for a solution to a similar problem and have tried to adapt this solution which I helpfully found on the forum from 2014, but sadly it didn't work for me.
                My data is very simple - only two variables, 'cpidrid' which is a string in the format eg. ///103446100/// (the /// is required for the CPIDR software I am using to analyse the language sample); and 'lccomm' which is also a string format lower case written comment from the participant. I need to save each observation as a separate .txt file with the name of the text file containing the 'cpidrid' in some form - eg. just the number part is fine. Any hints?

                Kind regards,
                Stephanie.

                Comment


                • #9
                  The problem is that the / characters are not legal in filenames. So Stata is unable to save the files unless you strip out those characters.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str15 cpidrid str22 lccomm
                  "///103446100///" "text of first comment" 
                  "///103446101///" "text of second comment"
                  "///103446102///" "test of third comment" 
                  end
                  
                  count
                  forvalues i = 1/`r(N)' {
                      local filename = subinstr(cpidrid[`i'], "/", "", .)
                      export delimited using `filename'.txt in `i', replace
                  }
                  Let's take a moment to improve your posting skills. First, never say something "didn't work." It's just too uninformative--did Stata crash? Did it give you error messages--if so what were they? Did it produce results but they weren't what you wanted? If so, what did you get, and why was it not what you expected? Did something undesirable other than that happen? Always describe exactly what happened. Remember, nobody else can see your computer screen, nor read your mind. In fact, better than describing is showing. Copy/paste the command(s) and output from the Results window directly here into the Forum editor, surrounding it by code delimiters. If you read the Forum FAQ before your next post, you will find this advice and much more that will, if followed, enhance your likelihood of getting timely and helpful responses from your posts.

                  Also, when asking for help with code, always show example data, and always use -dataex- to do that, as I have done above. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



                  When asking for help with code, always show example data. When showing example data, always use -dataex-.

                  Comment

                  Working...
                  X