![]() ![]() The output of the benchmark script is below (and slightly tidied up). This is done so that the output matches that of the RubyTagger, which is an array of 2-item arrays. # Scenario 1-A: load the tagger each time before processing text use SHORT text for taggingī.report("1-A: eng tagger") In order to get a fairly accurate idea of performance, the gems were tested under 4 scenarios:ġ-A: 10 times create an instance of the tagger and tag a short piece of textġ-B: 10 times create an instance of the tagger and tag a long piece of textĢ-A: create an instance of the tagger once and tag a short piece of text 10 timesĢ-B: create an instance of the tagger once and tag a long piece of text 10 timesīefore looking at the results, let's examine the main portion of the benchmark.rb file: (NOTE: I believe the RubyTagger gem has a C dependency). YamCha - Yet Another Multipurpose CHunk Analyzerīoth EngTagger and RubyTagger provide a simple API and could be easily installed as gems.Some of the ones excluded that you may be interested in considering: RubyTagger (rb-brill-tagger): a rule based tagger.EngTagger: a corpus-trained, probabilistic tagger (port of Perl Lingua::EN::Tagger). ![]() There are several taggers available, but I settled on testing the following two that are available as gems and seemingly robust: Regardless of how you are using a POS Tagger, you may find this benchmark of two Ruby POS Tagging libraries helpful. I use POS Tagging as a means of detecting invalid text, but there are many other possible uses as well. An accurate and efficient Part of Speech Tagger represents a valuable tool for various areas of natural language processing.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |