Skip to main content

Data Collection

The first wave of the corpus (e05p) consists of 300 listings. An empty search was submitted, which returned all active listings on the site and therefore a wide range of the different categories. In order to only include users who were not professional sellers on eBay, the listings were pruned so that only users with less than 200 ratings were included, and each user featured only once in the corpus. Additionally, listings with extensive delivery or returns information were excluded, as well as listings from shops.

This corpus was replicated in 2017 (e17p) and an additional corpus with listings from professional users was also created (users with a shop and more than 200 listings, checked manually, corpus e17x). An empty search which returns all listings was no longer possible in 2017; instead a category has to be chosen. The distribution of categories from 2005 was used to create the corpus in 2017.

In 2018, we used the web scraping tool ParseHub to automatically collect more listings. We used the search term vraiment to select listings that were potentially more likely to be from private sellers. More than 10,000 listings were collected. However, we filtered the results to only include one listing per user, and to not have more than 1000 ratings per user. Listings containing descriptions that were copy-pasted from elsewhere or expressions that indicated a professional activity (mon stock, mes photos, ma boutique, mes autres, shipping, tracklist, welcome, ask, regroupez, regroupe, ASUS GTX) were excluded. We manually annotated the use of vraiment as a stance marker and excluded listings which used vraiment in a different way. This left 356 listings.