Skip to content
Herbert Ullrich
Go back

Pipeline and dataset generation for automated fact-checking in almost any language

Edit page

Authors: J Drchal, H Ullrich, T Mlynář, V Moravec
Venue: Neural Computing and Applications 36 (30), 19023-19054
Year: 2024
Citations: 9

Abstract

The paper presents an end-to-end pipeline for building automated fact-checking datasets in almost any language, covering claim harvesting, normalization, evidence linking, and verification labeling. It emphasizes scalable data creation while preserving enough quality control to support robust model training and evaluation.

Resources

Notes

For us, this was a key “from prototype to production” step: we moved from one-off dataset creation to a repeatable multilingual workflow that can be adapted to new domains and languages with manageable effort.


Edit page
Share this post on:

Previous Post
Claim Extraction for Fact-Checking: Data, Models, and Automated Metrics
Next Post
🥉 AIC CTU system at AVeriTeC: Re-framing automated fact-checking as a simple RAG task