Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No module error of python in Stata

    Hello.

    I am trying to perform sentiment analysis using Python code in Stata.
    I have confirmed that it runs smoothly in Python's Jupyter notebook.
    However, I encountered errors starting from importing python packages when attempting to do everything in Stata, which I am more familiar with.
    Code:
    python :
    
    import pandas as pd
    import numpy as np
    import nltk
    nltk.download('vader_lexicon')
    from nltk.sentiment.vader import SentimentIntensityAnalyzer
    
    end
    Code:
    >>> import pandas as pd
    >>> import numpy as np
    >>> import nltk
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'nltk'
    (4 lines skipped)
    Basic packages like pandas and numpy did not cause errors in Stata, but nltk package shows a module not found error.
    I checked the directory, but I do not know any hint to resolve the error.
    Code:
    . python which pandas
    <module 'pandas' from 'C:\\Users\\inho8\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packa
    > ges\\pandas\\__init__.py'>
    
    . python which numpy
    <module 'numpy' from 'C:\\Users\\inho8\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packag
    > es\\numpy\\__init__.py'>
    
    . python which nltk
    Python module nltk not found
    I am unsure if it is a path configuration issue or another type of error.
    If I cannot resolve the issue, I may have to handle data preprocessing in Stata, sentiment analysis in Python, and then import the analysis results back into Stata for additional analysis.

    I would greatly appreciate any help.

  • #2
    You can have multiple python installations and multiple python environments in each instillation. IPython (the extension of python that Jupyter is built on) makes heavy use of virtual environments. It is possible that Stata is using one environment where nltk is not installed and your jupyter notebook is using another where nltk is installed.

    I might just start by navigating to this folder:

    Code:
    C:\\Users\\inho8\\AppData\\Local\\Programs\\Python\\Python38\\lib\\site-packages\\
    Then checking to see if you have a nltk folder there. If not, then the package is not installed where Stata expects.

    As an aside, it is a best practice to always have imports first in python. I would not recommend executing nltk.download() before importing SentimentIntesityAnalyzer from nltk.sentiment.vader as you do in #1. In a setting where you have a persistent global environment in memory (as with data analysis tasks/IPython stuff) I recommend you completely separate all imports into their own code block. You generally want all of your dependencies loaded before executing your own code.

    Comment


    • #3
      I may have to handle data preprocessing in Stata, sentiment analysis in Python, and then import the analysis results back into Stata for additional analysis.
      By the way, I actually tend to prefer this approach when using multiple platforms over using some kind of platform to platform interface. I think this approach tends to involve fewer moving parts behind the scenes and is generally less likely to cause a headache. Just my two cents - your programming philosophy may differ.

      Comment


      • #4
        Daniel Schaefer

        1. Thank you for your comment. I have not changed the path yet, but running my code in Stata on another laptop ran without problems.
        Code:
        . python :
        ----------------------------------------------- python (type end to exit) ----------
        >>> 
        >>> import pandas as pd
        >>> import numpy as np
        >>> import nltk
        >>> nltk.download('vader_lexicon')
        [nltk_data] Downloading package vader_lexicon to C:\Users\Inho
        [nltk_data]     Lee\AppData\Roaming\nltk_data...
        [nltk_data]   Package vader_lexicon is already up-to-date!
        True
        >>> from nltk.sentiment.vader import SentimentIntensityAnalyzer
        >>> 
        >>> end
        ------------------------------------------------------------------------------------
        
        . 
        end of do-file
        I am guessing it is some (unknown) environment setting issue on each PC. I will solve the path setting problem on the original PC and share later whether the code runs well.

        2. Your tips about using multiple programs are also very interesting. I heard that Python can be used in Stata, so I tried it out of curiosity, but I think I will have to compromise as long as it does not cause headaches. Stata is not guilty. I will have to study more and use Stata and Python freely.

        Comment

        Working...
        X