The N-Gram Processor is a set of scripts and a Perl module allowing the creation
and processing of n-gram lists out of text files. The feature set of the N-Gram
Processor is simple enough:
- creation of word n-gram lists out of input text, with n-gram frequencies
- listing of document counts (in how many docs an n-gram occurs)
- a menu-based interface (new to version 0.6)
- unicode support
- support for processing of large corpora, hardware allowing
- support for processing of annotated corpora
Please refer to the
manual for a more detailed description.
The NGP is a branch of the Ngram Statistics Package (NSP, v1.09) by Ted Pedersen and
collaborators including code of the v1.10 re-write by Bjoern Wilmsmann.
N-Gram Processor is cross-platform and currently available as beta
software. It was
tested under MacOS X and Xubuntu Linux, but should work well on any platform that can
run Perl code and bash shell code.
download |
project
page on github |
about the author