- Fix javadoc for building under Java 8 (Issue 18).
- Allow setting flush sequence in DefaultModel (Issue 19).
- Changed license to Apache License version 2. (Issue 17)
- Note that TreeTagger itself is subject different license terms available from the TreeTagger TreeTagger website.
- Fixed bug: model files cannot be read unless assertions are enabled (Issue 16)
- Initial support for reading TreeTagger model files. (Issue 14)
- Support for getting multiple tag/lemmas and their probabilities. This feature requires a TreeTagger binary newer than 2012-04-25. When used with previous versions, it will just hang. At the time of writing, the TreeTagger versions for OS X (Intel), Windows and Linux support this feature. It is possible that the versions for Solaris and OS X (PPC) may not be updated to support this feature. TT4J continues to work with other/older TreeTagger versions as long as this feature is not used. (Issue 13)
- Improved parsing of TreeTagger output.
- Changed default flush sequence to work with the TreeTagger model for chinese (Issue 6 - thanks Jérôme)
- Added detection if communication with TreeTagger starts running out-of-sync due to some odd characters appearing in tokens. This can be disabled, but per default the strict-mode is on.
- Added setting for the maximal token length (default 90000 bytes) - TreeTagger seems to have a limit of 99998 bytes per token and crashes when this is exceeded
- Improved handling of crashed TreeTagger process
- Generate a better exception message when no model or executable could be found.
- Provide getters for all properties of TreeTaggerWrapper.
- Updated license header and added it where missing.
- Fixed typo in Javadoc.
- Set plugin versions in POM to make Maven 3 happy.
- Set source encoding to UTF-8 in POM.
- Fixed TT4J hanging indefinitely when writer thread crashes during processing.
- Reader and writer threads are now monitored during processing.
- Fixed regression: Linux accidentially detected as Solaris.
- Fixed bug: DefaultModelResolver uses wrong file name.
- Fixed bug: DefaultModelResolver fails on Windows when path contains a colon.
- Fixed bug: Resource not properly destroyed when an exception is thrown in reader/writer thread.
- Improvement: Try harder to get to end-of-text mark.
- Improvement: Added tracing of start and end marks.
- Improvement: Massively improved throughput when processing a large number of documents.
- Improvement: Try to gracefully handle cases where TT does not produce a “token tag lemma” line. Return null for tag and lemma in these cases.
- Improvement: Ease integration of custom model resolvers.
- Improvement: Added tracing.
- Improvement: Improved robustness ignoring illegal tokens (e.g. containing tabs or line breaks).
- Improvement: Added performance mode which does not check for illegal tokens.
- Improvement: Allow setting the parameters -eps and -hyphen-heuristics needed to use TT4J with chunker models. Now a chunker can be build on top of TT4J.