Annotation

The corpus has been annotated for various features.

The tags and their meanings are the following:

  • ann: abbreviations or ‘for sale’ equivalents (tbe, je vends, vds)
  • bon: use of an evaluative attribute at the very beginning of the listing
  • ego: use of je
  • stn/sty: non-standard or standard usage of past participles agreement or negation
  • pre: presentatives (il y a, c’est)
  • vst: vraiment as a stance marker ("it’s really nice")
  • emo: emoticons
  • enc: use of bonnes enchères (happy bidding)
  • imp: most frequent imperative forms ( hèsitez, consulter, regardez)
  • att: evaluative attributes (not at the beginning of the listing)

In addition to these tags which are used consistently throughout all four subcorpora, the first subcorpus (2005) contains extra tags:

  • acc: accents which are missing or are non-standard
  • ang: anglicisms
  • con: contact details
  • inf: information
  • lex: informal lexical items
  • ort: orthographical ‘mistakes’
  • pub: marketing language
  • slo: use of slogans
  • syn: syntax, topicalisation