How to find all user-written programs related to Text Mining / Content Analysis

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#16

16 Aug 2019, 03:03

Red Owl : Thank you very much for the information!

Best regards,

Marcos
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#17

16 Aug 2019, 06:43

Marcos Almeida

Please let us know about your experience with KH Coder if you decide to try it.
I use it more often than WordStat for Stata (or the free-standing Wordstat) because I don't want to ask my doctoral dissertation students to purchase WordStat.

Disclaimer: I am not in any way associated with KH Coder or its author except as a very grateful and satisfied user.

Cheers,
Red Owl
1 like
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 869
#18

16 Aug 2019, 09:28

Red Owl, thanks for your introducing KHcoder. KHcoder is very excellent, however, when I use it to analyze Chinese (simplified) text (stored in utf-8), I encounter some problems. I can do word extraction by runing pre-processing successfully on rare occasions. But all too often, the pre-processing failed to do extraction. In these failure cases, it gives result to Tokens of 15 (12) and Types of 15 (12), with elapsed time of 00:00:07. The problem seems to be caused by Java (see the picture attached below), but I cannot fix it. Would you please give me an advice? Thank you.
Comment
Red Owl

Join Date: Nov 2016

Posts: 127
#19

16 Aug 2019, 09:43

Chen Samulsion

I'm not sure what is causing this problem, but the author of KHCoder, Higuichi Koichi, is very responsive to questions. You can post a question at:

https://github.com/ko-ichi-h/khcoder...3ANon-English+

Good luck.

Cheers,
Red Owl
1 like
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 869
#20

16 Aug 2019, 19:42

Red Owl thank you very much. It seems that my problem was caused by antivirus or security software. I suspended both antivirus and security software in my computer, and try to run KH Coder. This time the failure occasion is one out of ten approximately, that is to say, I got 9 successful pre-processing and only 1 failure. I contacted with Koichi Higuichi and got his reply. https://github.com/ko-ichi-h/khcoder/issues/76
Comment
caixiaodong

Join Date: Aug 2014

Posts: 6
#21

17 Sep 2019, 08:56

Marcos Almeida Hi Marcos, I have very similar needs in text mining using STATA. Mine include text classification (supervised machine learning) and topic modeling (unsupervised machine learning). I found STATA codes for both purposes, respectively.

There is also a few commands for preparing string variables:
-ngram- and -txttool-

Supervised ML for text classification:
-svmachines-

Unsupervised topic modeling (cluster identification):
-ldagibbs-

Document similarity analysis:
-lsemantica-

If you Google using these specific keywords, you will find PDF documentation by the authors.

Hope you find these useful.

Xiaodong
4 likes
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#22

17 Sep 2019, 09:21

caixiaodong Thank you very much for the suggestions. I have experience with txttool and ngram, but the other ones were rather new to me. Surely I'll give them a try. Thanks againa.

Best regards,

Marcos
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment