Pipeline and dataset generation for automated fact-checking in almost any language

Authors: J Drchal, H Ullrich, T Mlynář, V Moravec
Venue: Neural Computing and Applications 36 (30), 19023-19054
Year: 2024
Citations: 9

Abstract

The paper presents an end-to-end pipeline for building automated fact-checking datasets in almost any language, covering claim harvesting, normalization, evidence linking, and verification labeling. It emphasizes scalable data creation while preserving enough quality control to support robust model training and evaluation.

Resources

Video: TODO
Slides: TODO
Code: TODO
Dataset: TODO

Notes

For us, this was a key “from prototype to production” step: we moved from one-off dataset creation to a repeatable multilingual workflow that can be adapted to new domains and languages with manageable effort.

Edit page

Pipeline and dataset generation for automated fact-checking in almost any language

Links

Abstract

Resources

Notes