Skip to main content


Average length of listings in the four subcorpora:

  • e05p: 43 tokens
  • e17p: 49 tokens
  • e17x: 177 tokens
  • e18v: 97 tokens

The length of the listing varies depending on the category.

The XML file contains various metadata for each listing: a unique ID, the year and month it was collected in and the category the listing belongs to.

Distribution of categories in the first three subcorpora (e05p, e17c, e17p):

voiture et moto21
PC et téléphone20

Additional metadata

Some subcorpora have additional metadata, listed below:

  • e18v: number of ratings the user has
  • e05p: ‘svo’ – 0/1, if the listing contains at least one well-formed sentence with subject-verb-object
  • e17p: ‘text’ – Y/N, if the listing resembles a text with sentences and punctuation or not
  • e17p: the listing is split into two categories, either ‘inf’ or ‘ad’ – ‘inf’ refers to information that is either copy-pasted or numerical details (e.g. dimensions), ‘ad’ refers to everything written by the user