N-Gram Processor, licensed under the GPL v. 3



The N-Gram Processor is a set of scripts and a Perl module allowing the creation
and processing of n-gram lists out of text files. The feature set of the N-Gram
Processor is simple enough:

 

- creation of word n-gram lists out of input text, with n-gram frequencies
- listing of document counts (in how many docs an n-gram occurs)
- a menu-based interface (new to version 0.6)
- unicode support
- support for processing of large corpora, hardware allowing
- support for processing of annotated corpora
 
Please refer to the manual for a more detailed description.

The NGP is a branch of the Ngram Statistics Package (NSP, v1.09) by Ted Pedersen and
collaborators including code of the v1.10 re-write by Bjoern Wilmsmann.

  N-Gram Processor is cross-platform and currently available as beta software. It was
tested under MacOS X and Xubuntu Linux, but should work well on any platform that can
run Perl code and bash shell code.



download | project page on github | about the author




last update: 2016-02-14