Authors: J Drchal, H Ullrich, T Mlynář, V Moravec
Venue: Neural Computing and Applications 36 (30), 19023-19054
Year: 2024
Citations: 9
Links
Abstract
The paper presents an end-to-end pipeline for building automated fact-checking datasets in almost any language, covering claim harvesting, normalization, evidence linking, and verification labeling. It emphasizes scalable data creation while preserving enough quality control to support robust model training and evaluation.
Resources
- Video: TODO
- Slides: TODO
- Code: TODO
- Dataset: TODO
Notes
For us, this was a key “from prototype to production” step: we moved from one-off dataset creation to a repeatable multilingual workflow that can be adapted to new domains and languages with manageable effort.