File Format

The corpus is available to download in XML format upon registration.

The file contains all four subcorpora, and each listing has a unique ID, the year and month it was collected in, the category the listing is from, and some additional tags (as explained hier).

PDFs of screenshots of the listings are also available for all subcorpora, although due to technical reasons, only the first 249 are available for the 2018 subcorpus.