#

rlhf

Here are 61 public repositories matching this topic...

LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

python machine-learning ai nextjs discord-bot assistant language-model chatgpt rlhf

Updated Aug 5, 2023
Python

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

natural-language-processing pre-training pre-trained-language-models in-context-learning large-language-models llm llms chain-of-thought chatgpt rlhf instruction-tuning

Updated Aug 2, 2023
Python

hiyouga / ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

transformers pytorch lora language-model alpaca fine-tuning peft huggingface chatgpt rlhf chatglm qlora chatglm2

Updated Aug 4, 2023
Python

argilla

argilla-io / argilla

✨Argilla: the open-source data curation platform for LLMs

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Aug 6, 2023
Python

hiyouga / LLaMA-Efficient-Tuning

Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen)

bloom transformers falcon llama quantization language-model fine-tuning peft pre-training llm rlhf qlora baichuan-7b llama2 qwen-7b

Updated Aug 6, 2023
Python

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated Jul 24, 2023

THUDM / WebGLM

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

llm chatgpt rlhf webglm

Updated Jul 29, 2023
Python

docta

Docta-ai / docta

A Doctor for your data

data language-model data-curation data-centric-ai data-diagnosis data-centric-machine-learning rlhf

Updated Aug 5, 2023
Python

PKU-Alignment / safe-rlhf

Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Aug 1, 2023
Python

THUDM / ImageReward

ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

generative-model diffusion-models human-preferences rlhf

Updated Jul 11, 2023
Python

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Aug 4, 2023
Jupyter Notebook

xtreme1

xtreme1-io / xtreme1

Xtreme1 - The Next GEN Platform for Multimodal Training Data. #3D annotation, 3D segmentation, lidar-camera fusion annotation, image annotation and rlhf tools are supported!

machine-learning computer-vision data-visualization language-model annotation-tool 3d-annotation 3d-segmentation multimodal lidar-camera-fusion rlhf

Updated Jul 21, 2023
TypeScript

voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

nlp reinforcement-learning pytorch nlg language-model gpt-2 gpt-3 controlled-nlg chatgpt rlhf

Updated Aug 6, 2023
Python

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 基于中文金融知识的LLaMA微调模型；涉及SFT、RLHF、GPU训练部署等

nlp finance qa transformers text-generation chinese llama sft large-language-models rlhf

Updated Jun 30, 2023
Python

GaryYufei / AlignLLMHumanSurvey

Aligning Large Language Models with Human: A Survey

awesome survey llama gpt-4 large-language-models llms chatgpt rlhf supervised-finetuning llama2 chinese-llama

Updated Aug 4, 2023

WangRongsheng / MedQA-ChatGLM

🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答

medical dataset transformer freeze lora fine-tuning huggingface large-language-models llms chatgpt rlhf chatglm-6b

Updated Jul 24, 2023
Python

jackaduma / Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

pytorch llama gpt lora finetune ppo peft vicuna llm chatgpt rlhf reward-models vicuna-7b

Updated Apr 28, 2023
Python

lhao499 / chain-of-hindsight

Chain-of-Hindsight, a simpler and more effective alternative to RLHF

large-language-models learning-from-human-feedback rlhf

Updated Jun 14, 2023
Python

xrsrke / instructGOOSE

Implementation of Reinforcement Learning from Human Feedback (RLHF)

reinforcement-learning chatgpt human-feedback rlhf instructgpt

Updated Apr 7, 2023
Jupyter Notebook

jianzhnie / open-chatgpt

The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.

llama gpt lora ppo peft llm chatgpt rlhf stanford-alpaca

Updated Jun 1, 2023
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."