File Format

The messages have been tagged with Part of Speech tags, using TreeTagger, and are in a basic XML format.

Due to copyright reasons, we can't upload the full corpus to our website: if you are interested in working with the corpus please contact  us directly via email.

A small preview can be viewed hier.