Welcome to Apache OpenNLP!

GitHub license Maven Central Documentation Status Build Status Contributors GitHub pull requests OpenSSF Scorecard

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

This toolkit is written completely in Java and provides support for common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, coreference resolution, language detection and more!

These tasks are usually required to build more advanced text processing services.

The goal of the OpenNLP project is to be a mature toolkit for the above mentioned tasks.

An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.

Presently, OpenNLP includes common classifiers such as Maximum Entropy, Perceptron and Naive Bayes.

OpenNLP can be used both programmatically through its Java API or from a terminal through its CLI. OpenNLP API can be easily plugged into distributed streaming data pipelines like Apache Flink, Apache NiFi, Apache Spark.

Useful Links

For additional information, visit the OpenNLP Home Page

You can use OpenNLP with any language, demo models are provided here.

The models are fully compatible with the latest release, they can be used for testing or getting started.

[!NOTE]
Please train your own models for all other use cases.

Documentation, including JavaDocs, code usage and command-line interface examples are available here

For recent news, updates and topics, you can:
- join the regular mailing lists, - follow the project's Bluesky social media channel, or - join the Slack channel (available to people with an @apache.org email address or upon invitation).

Please, also check the Stack Overflow community's OpenNLP questions and answers.

Overview

Currently, the library has different modules:

Getting Started

You can import the core toolkit directly from Maven or Gradle:

Maven

<dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-runtime</artifactId>
    <version>3.0.0-M1</version>
</dependency>
<!-- if model support is needed -->
<dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-models</artifactId>
    <version>3.0.0-M1</version>
</dependency>

Note: opennlp-runtime ships with the MaxEnt ML implementation by default. If you need other ML implementations, please add the corresponding dependencies as well.

Gradle

compile group: "org.apache.opennlp", name: "opennlp-runtime", version: "3.0.0-M1"
compile group: "org.apache.opennlp", name: "opennlp-models", version: "3.0.0-M1"

For more details please check our documentation

Migrating from 2.x to 3.x

The 3.x release line of Apache OpenNLP introduces no known breaking changes but modularizes the project for better usage as a library and to support future extensibility. The core API remains stable and compatible with 2.x, but the project structure has been reorganized into multiple modules.

That means, that you can continue to use the previous opennlp-tools artifact as a dependency. However, we strongly recommend to switch to the new modular structure and import only the components you need, which will result in a smaller dependency footprint.

Only opennlp-runtime needs to be added as a dependency, and you can add additional modules (e.g. opennlp-ml-maxent, opennlp-models, etc.) as required by your project. For users of the traditional CLI toolkit, nothing changes with the 3.x release line. CLI usage remains stable as described in the project's dev manual.

Head's up

The Apache OpenNLP team is planning to change the package namespace from opennlp to org.apache.opennlp in a future release (potentially 4.x). This change will be made to align with standard Java package naming conventions and to avoid potential conflicts with other libraries.

In addition, the Apache OpenNLP team is considering the raise of the minimal Java version to JDK 21+ in a future release (potentially 4.x) to take advantage of the latest language features and improvements.

Branches and Merging Strategy

To support ongoing development and stable maintenance of Apache OpenNLP, the project follows a dual-branch model:

Branch overview

Workflow summary

Building OpenNLP

At least JDK 17 and Maven 3.3.9 are required to build the library.

After cloning the repository go into the destination directory and run:

mvn install

Additional Development Information

Contributing

The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.

If you would like to get involved please follow the instructions here

How to Report Issues

The Apache OpenNLP project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/opennlp

List of JIRA Issues Fixed in this Release

Click issuesFixed/jira-report.html for the list of issues fixed in this release.