Authors: F Seitl, T Kovářík, S Mirshahi, J Kryštůfek, R Dujava, M Ondreička, et al.
Venue: arXiv preprint arXiv:2404.04068
Year: 2024
Citations: 11
Links
- arXiv: 2404.04068
Abstract
This work studies how to evaluate information extraction outputs beyond simple exact matching. We discuss practical quality dimensions (correctness, completeness, consistency, and usefulness), compare metric choices, and show where automatic scores diverge from human judgment for real extraction tasks.
Resources
- Video: TODO
- Slides: TODO
- Code: TODO
- Dataset: TODO
Notes
I like this paper as a “methodology anchor” for later projects: before optimizing extraction models, we first make evaluation explicit and reliable. That framing helped us design cleaner experiments in downstream fact-checking and claim verification pipelines.