Plug and play modules to optimize the performances of your AI systems
-
Updated
Apr 8, 2023 - Python
Plug and play modules to optimize the performances of your AI systems
Lossy PNG compressor — pngquant command based on libimagequant library
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
中文LLaMA&Alpaca大语言模型+本地CPU部署 (Chinese LLaMA & Alpaca LLMs)
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
PaddleSlim is an open-source library for deep model compression and architecture search.
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
Faster Whisper transcription with CTranslate2
Brevitas: quantization-aware training in PyTorch
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."