Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spar…
#
hadoop
Repositories 1,390
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, work…
Python
Updated Sep 13, 2018
Deeplearning4j, ND4J, DataVec and more - deep learning & linear algebra for Java/Scala with GPUs + Spark - From Skymind
Distributed SQL query engine for big data
Java
Updated Sep 21, 2018
Alluxio, formerly Tachyon, Unify Data at Memory Speed
Java
Updated Sep 21, 2018
Open Source Fast Scalable Machine Learning Platform For Smarter Applications (Deep Learning, Gradient Boosting, Rando…
h2o
machine-learning
data-science
deep-learning
big-data
ensemble-learning
gbm
random-forest
naive-bayes
pca
opensource
distributed
multi-threading
java
python
r
hadoop
spark
gpu
automatic
Java
Updated Sep 21, 2018
Hue is an open source Workbench for developing and accessing Data Apps.
Python
Updated Sep 21, 2018
BigDL: Distributed Deep Learning Library for Apache Spark
Scala
Updated Sep 21, 2018
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Makefile
Updated Oct 13, 2017
A large-scale entity and relation database supporting aggregation of properties
Java
Updated Sep 21, 2018
AI on Hadoop
Java
Updated Jun 26, 2018
A pandas-like deferred expression system, with first-class SQL support (Impala, PostgreSQL, SQLite, ...)
Hadoop, Docker, Kafka, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, D…
nagios-plugins
zookeeper
hadoop
hbase
cloudera
hbase-client
jenkins
travis-ci
nagios-plugin
hortonworks
ambari
cassandra
elasticsearch
docker
kafka
solr
redis
rabbitmq
consul
datastax
Perl
Updated Aug 31, 2018
LizardFS is an Open Source Distributed File System licenced under GPLv3.
c-plus-plus
gplv3
nas
macosx
linux
posix
distributed-systems
distributed-computing
fault-tolerance
high-performance
high-availability
snapshot
qos
erasure-coding
replication
replicas
geo-replication
hsm
hierarchical-storage
hadoop
C++
Updated Sep 3, 2018
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
machine-learning
deep-learning
apache-spark
data-parallelism
distributed-optimizers
keras
optimization-algorithms
tensorflow
data-science
hadoop
Python
Updated Jul 25, 2018
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on A…
Java
Updated Sep 21, 2018
MooseFS - Open Source Network Distributed File System. MooseFS 3.0 is stable and recommended for production environme…
dfs
software-defined-storage
posix
filesystem
file-system
distributed-file-system
clustering
distributed-storage
distributed-computing
c
fuse
big-data
snapshot
storage-tiering
ha
high-availability
scalability
storage
moosefs
hadoop
C
Updated Jul 23, 2018
Resource scheduling and cluster management for AI
kubernetes
gpu-cluster
resource-management
scheduling
deep-learning
big-data
machine-learning
hadoop
tensorflow
cntk
cluster-manager
gpu
model-training
kubernetes-deployment
ai
artificial-intelligence
yarn
pytorch
jupyter
Java
Updated Sep 21, 2018
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Updated Jan 9, 2018
DockerHub public images - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr / SolrCloud, Presto, Apache Drill, Nifi, S…
hadoop
hbase
cassandra
solr
solrcloud
kafka
consul
superset
zookeeper
apache-drill
nifi
docker-image
dockerhub
docker
rabbitmq-cluster
nagios-plugins
spark
presto
rabbitmq
linux
Shell
Updated Aug 25, 2018
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on va…
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Java
Updated Apr 25, 2018
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT…
Java
Updated May 10, 2018
Lightweight, simple structured NoSQL database for Android
android
nosql
sql
data
local
saver
shared
preferences
path
uri
simple
cassandra
firebase
mongo
db
mongodb
hadoop
cassandra-database
elastic
Java
Updated Jan 4, 2018
Kafka Connect HDFS connector
Create clusters of VMs on the cloud and configure them with Ansible.
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Java
Updated Sep 5, 2018
A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data…
Java
Updated Sep 21, 2018
Spark Library for Hadoop Upserts And Incrementals
Java
Updated Sep 20, 2018
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Java
Updated Sep 14, 2018