Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Word cloud and sentiment analysis using Stata

    Dear Stata Forum members,


    I wish to find a user-written program to perform high quality "word cloud" graphics as well as sentiment analysis in Stata.

    Actually, I found the SJ-18-2 dm0094, which informs to "draw a simple word cloud graph for visual analysis".

    I installed the adofile but if I type - help wordcloud - as recommended in the instructions, I get no help whatsoever, just a link to install the adofile, which I already did.

    I'm using Stata 15 IC and I would like to know if there are other user-written programs to perform word cloud in Stata.

    Thank you in advance.
    Best regards,

    Marcos

  • #2
    Hi Marcos,

    I think you may need to install the package from that article once again. It should install two ado files (one for wordfreq and wordcloud), and their respective help files. There is a toy example at the bottom of the help file for wordcloud that requires first determining word frequency. I quickly did this on my copy of Stata 15 and had success, so I wonder if maybe you haven't installed all necessary files.

    Comment


    • #3
      Leonardo Guizzetti Thank you for your suggestion. Actually, I had the program already installed, by I took care to do the adofile update today. The command - wordfreq - worked fine, but there was no sign of - wordcloud - for that matter. Typing - help wordcloud - prompted me to the dowloading link, and a tried it again.

      Maybe I'll need to unstall and re-install it and check whether that solves the issue.

      That said, the description of the program depicts it as capable of a "simple" word cloud graph (unfortunately, I couldn“t verify how simple it is) and I need a high-quality word cloud graph, as those we get with some text mining packages in R. If possible, I'd rather create such a graph in Stata.
      Best regards,

      Marcos

      Comment


      • #4
        I expect Stata 17 has such function!

        Comment


        • #5
          In Stata 16, you may use Stata Python integration to create Word cloud. Here is an example https://huapeng01016.github.io/chicago19/#/thanks and the do file:

          Code:
          cscript
          
          version 16
          
          python:
          import nltk   
          import requests
          from bs4 import BeautifulSoup
          
          
          url = "https://www.stata.com/new-in-stata/python-integration/"    
          html = requests.get(url)  
          text = BeautifulSoup(html.text).get_text() 
          print(text)
          
          from wordcloud import WordCloud
          
          wordcloud = WordCloud(max_font_size=75, max_words=100, background_color="white").generate(text)
          
          from sfi import Platform
          import matplotlib
          if Platform.isWindows():
              matplotlib.use('TkAgg')
          import matplotlib.pyplot as plt
          
          plt.imshow(wordcloud, interpolation='bilinear')
          plt.axis("off")
          plt.savefig("words.png")
          
          end

          Comment


          • #6
            This is really good news.

            Hopefully someday we will have directly in Stata, for those who don't have experience with Python, like me.
            Best regards,

            Marcos

            Comment


            • #7
              Originally posted by Hua Peng (StataCorp) View Post
              In Stata 16, you may use Stata Python integration to create Word cloud. Here is an example https://huapeng01016.github.io/chicago19/#/thanks and the do file:

              Code:
              cscript
              
              version 16
              
              python:
              import nltk
              import requests
              from bs4 import BeautifulSoup
              
              
              url = "https://www.stata.com/new-in-stata/python-integration/"
              html = requests.get(url)
              text = BeautifulSoup(html.text).get_text()
              print(text)
              
              from wordcloud import WordCloud
              
              wordcloud = WordCloud(max_font_size=75, max_words=100, background_color="white").generate(text)
              
              from sfi import Platform
              import matplotlib
              if Platform.isWindows():
              matplotlib.use('TkAgg')
              import matplotlib.pyplot as plt
              
              plt.imshow(wordcloud, interpolation='bilinear')
              plt.axis("off")
              plt.savefig("words.png")
              
              end
              Hi,

              It looks like this code doesn't work anymore. Is there an update?

              Best,
              Kabira

              Comment


              • #8
                Stiil works for me in both Stata 16 and 17. First, there is an indentation issue with the above code, you need a tab or 4 spaces in front of the matplotlib.use('TkAgg') which I do not know if it's in the code you run or it's due to copy/paste.

                Code:
                 if Platform.isWindows():    
                    matplotlib.use('TkAgg')
                If this isn't it, then what is the error you get?

                Comment


                • #9
                  Dear @Hua Peng, how can we embed that python code in #5 into an adofile? For example I want to embed it in a command named wordcloud, and its syntax maybe as below:
                  Code:
                  wordcloud, url(string) save(string)

                  Comment


                  • #10
                    Code:
                    capture program drop wordcloud
                    
                    program define wordcloud
                    
                    syntax [anything], url(string asis)
                    
                    local url `url'
                    display "`url'"
                    
                    python:
                    import nltk  
                    import requests
                    from sfi import Macro
                    from bs4 import BeautifulSoup
                    
                    url = Macro.getLocal("url")
                    html = requests.get(url)  
                    text = BeautifulSoup(html.text).get_text()
                    
                    from wordcloud import WordCloud
                    
                    wordcloud = WordCloud(max_font_size=75, max_words=100, background_color="white").generate(text)
                    
                    from sfi import Platform
                    import matplotlib
                    if Platform.isWindows():
                        matplotlib.use('TkAgg')
                    import matplotlib.pyplot as plt
                    
                    plt.imshow(wordcloud, interpolation='bilinear')
                    plt.axis("off")
                    plt.savefig("words.png")
                    
                    end
                    Code:
                    . wordcloud, url(https://www.statalist.org/forums/forum/general-stata-discussio
                    > n/general/1509235-word-cloud-and-sentiment-analysis-using-stata)
                    https://www.statalist.org/forums/forum/general-stata-discussion/general/1509235
                    > -word-cloud-and-sentiment-analysis-using-stata
                      File "<stdin>", line 1
                        import nltk
                        ^
                    IndentationError: unexpected indent
                    r(7102);
                    Last edited by Chen Samulsion; 14 Feb 2023, 20:07.

                    Comment


                    • #11
                      WordCloud module for Python seems to require some particular windows libraries (C++) that are not part of a standard windows 11 installation ("pip install wordcloud" exits with errors -- anyone run into and solved this?

                      Comment


                      • #12
                        I solved my question in #9 & #10 by using a clumsy way, however, it is useful for me. Thank you Dominique Bourget who provide his suggestion in a releated thread about Python-Stata integration:
                        https://www.statalist.org/forums/for...ach-stata-loop
                        Originally posted by Dominique Bourget View Post
                        Hello,
                        I resolved this issue by using 'python:' for a single line of python command, instead of using the (python: / end ) block.

                        Thank you,

                        Comment

                        Working...
                        X