Reverse Dependencies of rouge-score
The following projects have a declared dependency on rouge-score:
- adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- adapters — A Unified Library for Parameter-Efficient and Modular Transfer Learning
- adaptor — Adaptor: Objective-centric Adaptation Framework for Language Models.
- agi-med-metrics — Utils for agi-med team metric calculation
- ai-metrics — A library for basic NLP metric score implementations
- ai2-catwalk — A library for evaluating language models.
- al360-trustworthyai-text — SDK API to assess text Machine Learning models.
- alexa-teacher-models — Alexa Teacher Models
- anoteai — An SDK for interacting with the Anote API
- arize — A helper library to interact with Arize AI APIs
- asia-ds1-toolbox — Toolbox for Asia data science challenge
- assert-llm-tools — Automated Summary Scoring & Evaluation of Retained Text
- AutoRAG — Automatically Evaluate RAG pipelines with your own data. Find optimal structure for new RAG product.
- autotrain-advanced — no summary
- autotransformers — a Python package for automatic training and benchmarking of Language Models.
- azureml-metrics — Contains the ML and non-Azure specific common code associated with AzureML metrics.
- biochatter — Backend library for conversational AI in biomedicine
- chateval — Evaluation Framework for Chatbots in Generative AI
- classy-core — A powerful tool to train and use your classification models.
- codegen-metrics — Package for computation of code generation metrics
- cody-adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- Comprehensive-RAG-Evaluation-Metrics — This library provides a comprehensive suite of metrics to evaluate the performance of Retrieval-Augmented Generation (RAG) systems. RAG systems, which combine information retrieval with text generation, present unique evaluation challenges beyond those found in standard language generation tasks
- compression-distance — A compression-based edit distance metric for text comparison
- crfm-helm — Benchmark for language models
- cv-parsing — NLP Application to parse RH Curriculum Vitae for the RH department
- danoliterate — Benchmark of Generative Large Language Models in Danish
- dfcx-scrapi — A high level scripting API for bot builders, developers, and maintainers.
- dimweb-persona-bot — A dialogue bot with a personality
- diversity — no summary
- docuverse — State-of-the-art Retrieval/Search engine models, including ElasticSearch, ChromaDB, Milvus, and PrimeQA
- domino-code-assist — no summary
- easy-testing — Framework for testing
- easyinstruct — An Easy-to-use Instruction Processing Framework for Large Language Models.
- easyllm-kit — An easy recipes of code for large language model
- entail — Python Distribution Utilities
- eval-mm — eval-mm is a tool for evaluating Multi-Modal Large Language Models.
- evalRagPk — A library for evaluating Retrieval-Augmented Generation (RAG) systems
- evalscope — EvalScope: Lightweight LLMs Evaluation Framework
- evaluate — HuggingFace community-driven open-source library of evaluation
- evaluator-blog — no summary
- examinationrag — XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augmented Generation
- fanoutqa — The companion code for the FanOutQA dataset + benchmark for LLMs.
- fate-llm — Federated Learning for Large Language Models
- flagai — FlagAI aims to help researchers and developers to freely train and test large-scale models for NLP/CV/VL tasks.
- FLAML — A fast library for automated machine learning and tuning
- fmeval — Amazon Foundation Model Evaluations
- furiosa-llm-models — Furiosa LLM
- gem-metrics-fork — GEM Challenge metrics
- geniusrise-text — Text bolts for geniusrise
- geniusrise-vision — Huggingface bolts for geniusrise
- grag — A simple package for implementing RAG
- h2ogpt — no summary
- hezar — Hezar: The all-in-one AI library for Persian, supporting a wide variety of tasks and modalities!
- hinteval — A Python framework designed for both generating and evaluating hints.
- indic-eval — A package to make LLM evaluation easier
- instruct-qa — Empirical evaluation of retrieval-augmented instruction-following models.
- instructlab — Core package for interacting with InstructLab
- jury — Evaluation toolkit for neural language generation.
- keras-hub — Industry-strength Natural Language Processing extensions for Keras.
- keras-hub-nightly — Industry-strength Natural Language Processing extensions for Keras.
- koai — Korean AI Project
- kogito — A Python NLP Commonsense Knowledge Inference Toolkit
- kurenai — Thin wrapper of google-research's rouge-score; supports non-ascii by default
- langcheck — Simple, Pythonic building blocks to evaluate LLM-based applications
- langeval — An evaluation library for NLP models
- langfair — LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
- LangRAGEval — LangRAGEval is a library for evaluating responses based on faithfulness, context recall, answer relevancy, and context relevancy.
- langsafe — An evaluation library for NLP models
- langtests — An evaluation library for NLP models
- lighteval — A lightweight and configurable evaluation package
- llama-cookbook — Llama-cookbook is a companion project to the Llama models. It's goal is to provide examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models.
- llama-recipes — Llama-recipes is a companion project to the Llama models. It's goal is to provide examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models.
- llm-blender — LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths and weaknesses of multiple open-source large language models (LLMs). LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.
- llm-toolkit — LLM Finetuning resource hub + toolkit
- llmebench — A Flexible Framework for Accelerating LLMs Benchmarking
- llmetrics — A metrics and evaluation library for LLMs
- llmsanitize — LLMSanitize: a package to detect contamination in LLMs
- llmuses — Eval-Scope: Lightweight LLMs Evaluation Framework
- lm-eval — A framework for evaluating language models
- lm-polygraph — Uncertainty Estimation Toolkit for Transformer Language Models
- longeval — Prepare your summarization data in a format compatible with the longeval guidelines for human evaluation.
- luna-nlg — Source code for the LUNA project
- mallm — Multi-Agent Large Language Models for Collaborative Task-Solving.
- mblm — Multiscale Byte Language Model
- ml-init — Install the main ML libraries
- modelscope — ModelScope: bring the notion of Model-as-a-Service to life.
- ms-opencompass — A lightweight toolkit for evaluating LLMs based on OpenCompass.
- mw-adapter-transformers — A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models
- mygpt — A locally runnable LLM
- nemo-toolkit — NeMo - a toolkit for Conversational AI
- neuroprompt — A smart prompt compression and optimization tool for LLMs
- nlg-metricverse — An End-to-End Library for Evaluating Natural Language Generation.
- nlpboost — A package for automatic training of NLP (transformers) models
- nutmegredundancysolver — no summary
- oarelatedworkevaluator — Package for evaluation of OARelatedWork dataset.
- ohmeow-blurr — A library designed for fastai developers who want to train and deploy Hugging Face transformers
- opencompass — A comprehensive toolkit for large model evaluation
- parsbench — ParsBench provides toolkits for benchmarking LLMs based on the Persian language tasks.
- primeqa — State-of-the-art Question Answering
- pt-pump-up — Hub for Portuguese NLP resources
1
2