Toolset for working with spoken language corpora
GPL-2.0
Yes
Desktop-based application
Windows
macOS
Linux
The software is freely available online (with sign up option and/or social logins)
Yes
Raw text
XML
TSV
Audio
Video
Image
The data can be exported.
Language-independence
Writing script support
LTR
unicode support
English
German
French
Chinese
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Free span annotation
independent of sentences
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Table
Musical score visualisation
No
No
No
No
No
No
Yes
Yes
No
No
.
Text Annotation Platform
Apache-2.0
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
some public instances available
No
Raw text
CONLL-U
XML
TSV
Audio
Video
Image
External databases (e.g. dictionaries)
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Chains (e.g. coreferences)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Image (e.g. a graph or coloured text)
partial
partial
annotators working independently on their own annotations of the same data.
Yes
Yes
partial
Yes
Yes
Yes
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Enhanced dependencies
No
relation between spans
chains
document-level
Nested expressions
Overlapping expressions
partial
INCEpTION caters to a wide range of individual users who find different aspects of the tool useful. I would say that probably the ability to link to knowledge bases the ability to work on sensitive data the ability to use ML/AI to support the annotation process the multi-user capabilities curation support agreement calculation support and the range of supported data formats are among the most attractive features. INCEpTION also caters to institutional users who need to integrate the tool into their existing infrastructure by supporting e.g. single-sign-on mechanisms docker-based deployment a remote API etc. .
Data labeling tool for all data types (computer vision, natural language processing, speech, voice, and video models)
Apache-2.0
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
Yes
Raw text
CONLL-U
XML
TSV
Audio
Video
Image
The data can be exported.
The data can be stored with a third party storage platform (e.g. Git repository)
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Chains (e.g. coreferences)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
Touchscreen
Yes
Image (e.g. a graph or coloured text)
Table
Raw format (e.g. CONLL-U)
No
Yes
annotators working independently on their own annotations of the same data
annotators working together on the same shared annotation of the data.
Yes
Yes
No
No
Yes
Yes
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Metadata
No
Text classification
NER
Discontinuous expressions
Nested expressions
Overlapping expressions
No
.
Annotate linguistic rich data
CC_BY
Yes
Web-based application
Windows
macOS
Linux
The software is freely available online (without sign up option )
Yes
CONLL-U
The data can be exported.
Language-independence
Writing script support
LTR
unicode support
English
word-level
Tokens (e.g. part-of-speech tag morphological features)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Annotation inside sentences
Yes
Keyboard
Yes
Image (e.g. a graph or coloured text)
Raw format (e.g. CONLL-U)
No
Yes
No
No
No
No
No
No
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Enhanced dependencies
Empty nodes
Metadata
Yes
NIL
No
.
A fully customizable and programmable graphical editor and viewer for tree-like structures.
GPL-2.0
Yes
Desktop-based application
Windows
Linux
The software should be run by the user (local machine or own server)
Yes
Raw text
CONLL-U
XML
External databases (e.g. dictionaries)
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Chains (e.g. coreferences)
Annotation inside sentences
Annotation across sentences
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Graphical representation of a tree
No
No
No
partial
Yes
No
Yes
Yes
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Enhanced dependencies
Empty nodes
No
No
PALMYRA is a platform-independent graphical tool for syntactic dependency annotation supporting languages that require complex morphological tokenization.
MIT
Yes
Web-based application
Windows
macOS
Linux
The software is freely available online (with sign up option and/or social logins)
Yes
CONLL-U
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
Tokens (e.g. part-of-speech tag morphological features)
Relations (e.g. dependencies)
Annotation inside sentences
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Image (e.g. a graph or coloured text)
Raw format (e.g. CONLL-U)
Yes
No
No
No
No
No
No
No
UPOS
FEATS
XPOS
Basic dependencies
Empty nodes
Metadata
No
No
We recently added an Undo/Redo functionality and also the ability to synchonize files on google Drive https://aclanthology.org/2024.lrec-main.1101.pdf .
Utility for annotating and visualizing Dependency Graphs for natural language sentences.
not specified
No
Desktop-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
Yes
XML
CONLL-X
The data can be exported.
Language-independence
Writing script support
LTR
unicode support
English
Tokens (e.g. part-of-speech tag morphological features)
Relations (e.g. dependencies)
Annotation inside sentences
Yes
Mouse (e.g. drag & drop)
No
Image (e.g. a graph or coloured text)
No
No
No
No
No
No
No
No
Lemmas
UPOS
Basic dependencies
No
No
.
Windows
macOS
Linux
A version controlled annotation interace for XML and spreadsheet data, as well as nested NER
Apache-2.0
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
No
Raw text
XML
Corpus Workbench vertical format
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Chains (e.g. coreferences)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
No
Image (e.g. a graph or coloured text)
Table
Raw format (e.g. CONLL-U)
Yes
No
annotators working independently on their own annotations of the same data.
annotators working together on the same shared annotation of the data.
Yes
Yes
Yes
No
No
No
Lemmas
UPOS
FEATS
XPOS
Metadata
No
concurrent/conflicting span annotations
nested NER
entity linking
Nested expressions
Overlapping expressions
No
A comprehensive linguistic software tool designed for documenting, analyzing, and managing lexical and textual data, particularly for under-documented languages.
GNU LGPL
Yes
Desktop-based application
Windows
Linux
The software should be run by the user (local machine or own server)
Yes
Raw text
XML
SFM
ToolBox files
Praat files
Audio
Video
Image
External databases (e.g. dictionaries)
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Table
Raw format (e.g. CONLL-U)
partial
No
partial
No
partial
No
Yes
Yes
Metadata
No
No
.
An open-source no-code system for text annotation and building text classifiers
Apache-2.0
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
Yes
Raw text
CONLL-U
XML
TSV
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
Spans (e.g. names entities)
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Image (e.g. a graph or coloured text)
No
Yes
annotators working together on the same shared annotation of the data.
Yes
Yes
Yes
No
Yes
Yes
Metadata
No
No
.
A graphical tool to annotate sentences in dependency syntax
BSD 3-Clause
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
No
CONLL-U
CoNLL-U plus
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Annotation inside sentences
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Image (e.g. a graph or coloured text)
Table
Raw format (e.g. CONLL-U)
SD
LateX (tikz)
No
No
partial
No
No
Yes
No
Yes
Yes
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Enhanced dependencies
Empty nodes
Metadata
Yes
BIO-like annotations
No
git-support (automatic gid add/git commit; automatic generation of transliteration of non-Latin scripts into Latin-script (Translit= fields in MISC and trainsliteration of the entire sentence; statistics (number/percentage of UPOSes, deprels, etc). The tool can be used as a frontend to an parser which returns the result in CoNLL-U format .
NER annotation
other
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
No
Raw text
CONLL-U
The data can be exported.
Language-independence
Writing script support
LTR
RTL
English
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Annotation inside sentences
No
Mouse (e.g. drag & drop)
Touchscreen
No
Image (e.g. a graph or coloured text)
Yes
No
annotators working independently on their own annotations of the same data.
No
Yes
No
No
No
Yes
UPOS
XPOS
Multi-word tokens
Metadata
No
NER
partial
.
A linguistic data management tool designed for building lexicons, analyzing interlinear texts, and organizing fieldwork data, developed by SIL International.
other
No
Desktop-based application
Windows
The software should be run by the user (local machine or own server)
Yes
Raw text
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Interlinear glossed text annotations
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Yes
Table
No
No
No
No
No
No
Yes
Yes
Metadata
No
xs
No
.
Tool for the creation, annotation, and analysis of complex multimedia data, enabling time-aligned transcription and linguistic analysis.
GPL-3.0
Yes
Desktop-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
Yes
Raw text
XML
.eaf
TextGrid
CHAT
Shoebox/Toolbox
CSV
Audio
Video
Image
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions
Relations (e.g. dependencies)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
Touchscreen
Yes
Image (e.g. a graph or coloured text)
Table
CSV
Yes
No
annotators working independently on their own annotations of the same data.
No
No
No
No
Yes
Yes
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Metadata
No
Time-aligned annotations
multimodal annotations
Discontinuous expressions
Nested expressions
Overlapping expressions
No
Multimedia synchronization; customizable annotation workflow; metadata management and support like IMDI and CMDI .
Collaborative grammatical annotation tool
MIT
Yes
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
No
Raw text
CONLL-U
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Relations (e.g. dependencies)
Chains (e.g. coreferences)
Annotation inside sentences
Yes
Keyboard
Yes
Image (e.g. a graph or coloured text)
Table
tokenization
No
annotators working independently on their own annotations of the same data.
partial
No
Yes
partial
Yes
No
Lemmas
UPOS
FEATS
XPOS
Basic dependencies
Enhanced dependencies
Empty nodes
Metadata
Yes
No
A tool for manual linguistic annotation of corpora, which also enables advanced queries on top of these annotations.
Apache-2.0
Yes
Desktop-based application
Windows
The software should be run by the user (local machine or own server)
Yes
CONLL-U
XML
The data can be exported.
Language-independence
Writing script support
LTR
unicode support
English
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Annotation inside sentences
Yes
Keyboard
Mouse (e.g. drag & drop)
No
Image (e.g. a graph or coloured text)
No
No
annotators working independently on their own annotations of the same data.
No
No
No
No
Yes
No
Lemmas
UPOS
Multi-word tokens
Basic dependencies
No
No
.
Web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation.
GPL-3.0
Legacy
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
No
Raw text
CONLL-U
XML
TSV
FoLiA XML
The data can be exported.
The data can be stored with a third party storage platform (e.g. Git repository)
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities),
Multi-tokens (e.g. multi-word expressions
Relations (e.g. dependencies)
Chains (e.g. coreferences)
Annotation inside sentences
Yes
Mouse (e.g. drag & drop)
No
No
No
annotators working independently on their own annotations of the same data.
partial
Yes
No
No
Yes
Yes
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
No
Discontinuous expressions
Nested expressions
Overlapping expressions
No
online environment for collaborative text annotation
MIT
No
Web-based application
Windows
macOS
Linux
The software should be run by the user (local machine or own server)
No
Raw text
.txt+.ann
External databases (e.g. dictionaries)
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Chains (e.g. coreferences)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/documents
Yes
Keyboard
Mouse (e.g. drag & drop)
No
Image (e.g. a graph or coloured text)
Yes
No
annotators working independently on their own annotations of the same data.
No
No
Yes
No
Yes
No
Lemmas
UPOS
FEATS
XPOS
Basic dependencies
Empty nodes
No
events
relations
chains
attributes
normalization
Discontinuous expressions
Nested expressions
Overlapping expressions
No
Export of visualisations (images).
A web-based collaborative tool for linguistic annotation that supports various annotation tasks, including POS tagging, dependency parsing, and coreference resolution. It provides collaborative features and works well with formats like CoNLL-U.
Apache-2.0
Yes
Web-based application
Windows
macOS
Linux
The software is available upon request (the user should ask to get an account)
No
Raw text
CONLL-U
XML
TSV
External databases (e.g. dictionaries)
The data can be exported.
The data can be stored with a third party storage platform (e.g. Git repository)
Language-independence
Writing script support
LTR
RTL
unicode support
English
word-level
sentence-level
Tokens (e.g. part-of-speech tag morphological features)
Spans (e.g. names entities)
Multi-tokens (e.g. multi-word expressions) - Relations (e.g. dependencies)
Chainsc(e.g. coreferences)
Annotation inside sentences
Annotation across sentences
Annotation of full sentences/paragraphs/ documents
Yes
Keyboard
Mouse (e.g. drag & drop)
Yes
Image (e.g. a graph or coloured text)
Table
Raw format (e.g. CONLL-U)
Yes
Yes
annotators working independently on their own annotations of the same data
annotators working together on the same shared annotation of the data.
Yes
Yes
Yes
Yes
Yes
No
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Enhanced dependencies
Empty nodes
Metadata
Yes
NER
coreference
MWE
Discontinuous expressions
Nested expressions
Overlapping expressions
Yes
WebAnno supports a rich set of features including customizable annotation layers, integration with external data sources, support for multiple annotators, and visualization of annotated data. It also supports automatic pre-annotation and inter-annotator agreement calculation .
Collaborative UD treebank annotation and search tool.
AGPL-3.0
Yes
Web-based application
Windows
macOS
Linux
The software is freely available online (with sign up option and/or social logins)
No
Raw text
CoNLL-U
The data can be exported.
The data can be stored with a third party storage platform (e.g. Git repository)
Accessibility
English
French
word segmentation
Sentence segmentation
Tokens (e.g. part-of-speech tag, morphological features)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Annotation inside sentences
Import custom
Mouse (e.g. drag & drop)
Yes
Image (e.g. a graph or coloured text)
Table
Raw format (e.g. CONLL-U)
partial
No
multiple annotators working independently on their own annotations of the same data.
Yes
Yes
Yes
Yes
Yes
partial
Lemmas
UPOS
FEATS
XPOS
Basic dependencies
Multiword tokens
Metadata
Yes
Yes
ArboratorGrew can be used also to teach syntax. It has a mode when users can create exercises with different levels of complexity and export students results by the end Users can also generate lexicons from your annotated data And to ensure effective collaboration, ArboratorGrew provides a tag system .
Client-side, browser-only, language-independent tool for editing dependency trees according to the guidelines established by the Universal Dependencies project.
GPL-3.0
Yes
Web-based application
Windows
macOS
Linux
The software is freely available online (with sign up option and/or social logins)
Yes
Raw text
CONLL-U
VISL CG3
SDParse
Bracket notation
The data can be exported.
Language-independence
Writing script support
LTR
RTL
unicode support
English
Tokens (e.g. part-of-speech tag morphological features)
Multi-tokens (e.g. multi-word expressions)
Relations (e.g. dependencies)
Annotation inside sentences
Yes
Keyboard
Mouse (e.g. drag & drop)
No
Image (e.g. a graph or coloured text)
Table
Raw format (e.g. CONLL-U)
No
No
Yes
No
No
No
No
No
No
Lemmas
UPOS
FEATS
XPOS
Multi-word tokens
Basic dependencies
Enhanced dependencies
Empty nodes
No
No
.
GPL-2.0
2
Apache-2.0
6
CC_BY
1
MIT
3
not specified
1
GNU LGPL
1
BSD 3-Clause
1
other
2
GPL-3.0
3
AGPL-3.0
1
Yes
17
No
3
Legacy
1
Desktop-based application
7
Web-based application
14
Windows
22
macOS
18
Linux
20
The software is freely available online (with sign up option and/or social logins)
4
The software should be run by the user (local machine or own server)
15
some public instances available
1
The software is freely available online (without sign up option )
1
The software is available upon request (the user should ask to get an account)
1
Yes
12
No
9
Raw text
16
XML
12
TSV
6
2
CONLL-U
13
CONLL-X
1
Corpus Workbench vertical format
1
SFM
1
ToolBox files
1
Praat files
1
CoNLL-U plus
1
.eaf
1
TextGrid
1
CHAT
1
Shoebox/Toolbox
1
CSV
1
FoLiA XML
1
.txt+.ann
1
CoNLL-U
1
VISL CG3
1
SDParse
1
Bracket notation
1
Audio
5
Video
5
Image
5
3
External databases (e.g. dictionaries)
5
The data can be exported.
21
The data can be stored with a third party storage platform (e.g. Git repository)
4
Language-independence
20
Writing script support
20
LTR
20
unicode support
19
RTL
16
Accessibility
1
English
21
German
1
French
2
Chinese
1
word-level
12
sentence-level
8
word segmentation
1
Sentence segmentation
1
Tokens (e.g. part-of-speech tag morphological features)
19
Spans (e.g. names entities)
15
Multi-tokens (e.g. multi-word expressions)
12
Relations (e.g. dependencies)
14
Chains (e.g. coreferences)
7
Interlinear glossed text annotations
1
Multi-tokens (e.g. multi-word expressions
2
Spans (e.g. names entities),
1
Multi-tokens (e.g. multi-word expressions) - Relations (e.g. dependencies)
1
Chainsc(e.g. coreferences)
1
Tokens (e.g. part-of-speech tag, morphological features)
1
Free span annotation
1
independent of sentences
1
Annotation inside sentences
19
Annotation across sentences
9
Annotation of full sentences/paragraphs/documents
8
Annotation of full sentences/paragraphs/ documents
1
Yes
19
No
1
Import custom
1
Lemmas
15
UPOS
17
FEATS
14
XPOS
15
Multi-word tokens
11
Basic dependencies
15
Enhanced dependencies
7
Metadata
13
Empty nodes
8
Multiword tokens
1
No
16
Yes
5
Nested expressions
7
Overlapping expressions
7
Discontinuous expressions
5
xs
1
No
17
partial
2
Yes
2