Average length of listings in the four subcorpora:
- e05p: 43 tokens
- e17p: 49 tokens
- e17x: 177 tokens
- e18v: 97 tokens
The length of the listing varies depending on the category.
The XML file contains various metadata for each listing: a unique ID, the year and month it was collected in and the category the listing belongs to.
Distribution of categories in the first three subcorpora (e05p, e17c, e17p):
|voiture et moto||21|
|PC et téléphone||20|
Some subcorpora have additional metadata, listed below:
- e18v: number of ratings the user has
- e05p: ‘svo’ – 0/1, if the listing contains at least one well-formed sentence with subject-verb-object
- e17p: ‘text’ – Y/N, if the listing resembles a text with sentences and punctuation or not
- e17p: the listing is split into two categories, either ‘inf’ or ‘ad’ – ‘inf’ refers to information that is either copy-pasted or numerical details (e.g. dimensions), ‘ad’ refers to everything written by the user