πŸ‰ DRAGON. Dynamic RAG Benchmark On News

Radio

This leaderboard allows comparing RAG systems based on generative and retrieval metrics across different question types (simple, comparison, multi-hop, conditional, etc.).

  • Questions are automatically generated from news sources.
  • The question dataset is updated regularly, and metrics for open models are recalculated.
  • User submissions use the latest calculated metrics for them.
  • To recalculate a previously submitted configuration with the latest data version, use the submit_id received during the initial submission via the client (see instructions below).
  • Version 1.34.1 β†’ 600 questions, generated from news sources β†’ 03 июля 2025

    Generation Metrics

    Retrieval Metrics

    Citation

    @misc{chernogorskii2025dragondynamicragbenchmark,
          title={DRAGON: Dynamic RAG Benchmark On News}, 
          author={Fedor Chernogorskii and Sergei Averkiev and Liliya Kudraleeva and Zaven Martirosian and Maria Tikhonova and Valentin Malykh and Alena Fenogenova},
          year={2025},
          eprint={2507.05713},
          archivePrefix={arXiv},
          primaryClass={cs.CL},
          url={https://arxiv.org/abs/2507.05713}, 
    }
    

    Version Selection

    Start counting from the current dataset version

    1 5

    Click on models in the table to add them to the charts

    DRAGON. Dynamic RAG Benchmark Leaderboard