#

evaluation

Here are 1,035 public repositories matching this topic...

mrgloom / awesome-semantic-segmentation

🤘 awesome-semantic-segmentation

benchmark evaluation deeplearning semantic-segmentation

Updated May 8, 2021

Knetic / govaluate

Arbitrary expression evaluation for golang

go parsing evaluation expression

Updated Jan 22, 2024
Go

sdiehl / write-you-a-haskell

Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

compiler functional-programming book lambda-calculus evaluation type-theory type pdf-book type-checking haskel type-system functional-language hindley-milner type-inference intermediate-representation

Updated Jan 11, 2021
Haskell

evo

MichaelGrupp / evo

Python package for the evaluation of odometry and SLAM

benchmark robotics tum mapping metrics evaluation ros slam trajectory-analysis odometry trajectory ros2 kitti euroc trajectory-evaluation

Updated Mar 8, 2024
Python

viebel / klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

react javascript ruby python scheme clojure lua clojurescript reactjs common-lisp ocaml brainfuck evaluation prolog codemirror-editor reasonml interactive-snippets code-evaluation klipse-plugin

Updated Oct 7, 2022
HTML

zzw922cn / Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

audio deep-learning tensorflow paper end-to-end evaluation cnn lstm speech-recognition rnn automatic-speech-recognition feature-vector data-preprocessing phonemes timit-dataset layer-normalization rnn-encoder-decoder chinese-speech-recognition

Updated Mar 24, 2023
Python

CLUEbenchmark / SuperCLUE

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

evaluation chinese gpt-4 foundation-models chatgpt

Updated Mar 11, 2024

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark evaluation openai llm chatgpt large-language-model llama2

Updated Mar 19, 2024
Python

promptfoo / promptfoo

Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/Mixtral/Llama with CI/CD

testing ci evaluation ci-cd cicd prompts evaluation-framework rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Mar 19, 2024
TypeScript

microsoft / promptbench

A unified evaluation framework for large language models

benchmark evaluation prompt robustness adversarial-attacks large-language-models prompt-engineering chatgpt

Updated Mar 19, 2024
Python

uptrain-ai / uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

machine-learning monitoring evaluation experimentation jailbreak-detection autoevaluation root-cause-analysis prompt-engineering llmops openai-evals llm-prompting llm-eval llm-test hallucination-detection

Updated Mar 19, 2024
Python

huggingface / evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

machine-learning evaluation

Updated Mar 18, 2024
Python

Cloud-CV / EvalAI

☁️ 🚀 📊 📈 Evaluating state of the art in AI

python angularjs docker challenge machine-learning django ai reproducible-research leaderboard evaluation artificial-intelligence ai-challenges reproducibility evalai angular7

Updated Mar 18, 2024
Python

avalanche

ContinualAI / avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.

training library framework deep-learning metrics evaluation pytorch benchmarks strategies lifelong-learning continual-learning continualai

Updated Mar 18, 2024
Python

xinshuoweng / AB3DMOT

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

tracking machine-learning real-time computer-vision robotics evaluation evaluation-metrics multi-object-tracking kitti 3d-tracking 3d-multi-object-tracking 2d-mot-evaluation 3d-mot 3d-multi kitti-3d

Updated May 24, 2023
Python

ianarawjo / ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

ai evaluation large-language-models prompt-engineering llms llmops

Updated Mar 19, 2024
TypeScript

pycm

sepandhaghighi / pycm

Multi-class confusion matrix library in Python

Updated Feb 14, 2024
Python

Maluuba / nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

nlp natural-language-processing meteor machine-translation dialogue evaluation dialog rouge natural-language-generation nlg cider rouge-l skip-thoughts skip-thought-vectors bleu-score bleu task-oriented-dialogue

Updated Mar 15, 2024
Python

abo-abo / lispy

Short and sweet LISP editing

refactoring python scheme clojure navigation emacs-lisp common-lisp evaluation

Updated Mar 2, 2024
Emacs Lisp

MLGroupJLU / LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

benchmark evaluation model-assessment large-language-models llm llms

Updated Dec 28, 2023

Improve this page

Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."