Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Red Owl : Thank you very much for the information!
    Best regards,

    Marcos

    Comment


    • #17
      Marcos Almeida

      Please let us know about your experience with KH Coder if you decide to try it.
      I use it more often than WordStat for Stata (or the free-standing Wordstat) because I don't want to ask my doctoral dissertation students to purchase WordStat.

      Disclaimer: I am not in any way associated with KH Coder or its author except as a very grateful and satisfied user.

      Cheers,
      Red Owl

      Comment


      • #18
        Red Owl, thanks for your introducing KHcoder. KHcoder is very excellent, however, when I use it to analyze Chinese (simplified) text (stored in utf-8), I encounter some problems. I can do word extraction by runing pre-processing successfully on rare occasions. But all too often, the pre-processing failed to do extraction. In these failure cases, it gives result to Tokens of 15 (12) and Types of 15 (12), with elapsed time of 00:00:07. The problem seems to be caused by Java (see the picture attached below), but I cannot fix it. Would you please give me an advice? Thank you.
        Click image for larger version

Name:	mmexport1565968910439.png
Views:	1
Size:	83.3 KB
ID:	1512562

        Comment


        • #19
          Chen Samulsion

          I'm not sure what is causing this problem, but the author of KHCoder, Higuichi Koichi, is very responsive to questions. You can post a question at:

          https://github.com/ko-ichi-h/khcoder...3ANon-English+

          Good luck.

          Cheers,
          Red Owl

          Comment


          • #20
            Red Owl thank you very much. It seems that my problem was caused by antivirus or security software. I suspended both antivirus and security software in my computer, and try to run KH Coder. This time the failure occasion is one out of ten approximately, that is to say, I got 9 successful pre-processing and only 1 failure. I contacted with Koichi Higuichi and got his reply. https://github.com/ko-ichi-h/khcoder/issues/76

            Comment


            • #21
              Marcos Almeida Hi Marcos, I have very similar needs in text mining using STATA. Mine include text classification (supervised machine learning) and topic modeling (unsupervised machine learning). I found STATA codes for both purposes, respectively.

              There is also a few commands for preparing string variables:
              -ngram- and -txttool-

              Supervised ML for text classification:
              -svmachines-

              Unsupervised topic modeling (cluster identification):
              -ldagibbs-

              Document similarity analysis:
              -lsemantica-

              If you Google using these specific keywords, you will find PDF documentation by the authors.

              Hope you find these useful.

              Xiaodong

              Comment


              • #22
                caixiaodong Thank you very much for the suggestions. I have experience with txttool and ngram, but the other ones were rather new to me. Surely I'll give them a try. Thanks againa.
                Best regards,

                Marcos

                Comment

                Working...
                X