File Format

The corpus is in an XML file format.

The corpus is divided into three sub-corpora; messages from the morning, afternoon and evening.

The metadata for each message includes the time it was sent, the anonymised username and the subcorpus ID. There is no additional annotation.