-
Updated
Apr 10, 2021 - C
#
bigdata
Here are 1,432 public repositories matching this topic...
An open-source big data platform designed and optimized for the Internet of Things (IoT).
A curated list of awesome big data frameworks, ressources and other awesomeness.
data-science
data
awesome
database
data-stream
bigdata
series-database
data-visualization
data-warehouse
stream-processing
data-analytics
awesome-list
distributed-database
visualize-data
streaming-data
-
Updated
Mar 30, 2021
Upserts, Deletes And Incremental Processing on Big Data.
bigdata
stream-processing
data-integration
datalake
apachespark
hudi
apachehudi
incremental-processing
apacheflink
-
Updated
Apr 10, 2021 - Java
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
sql
spring-boot
dashboard
reactjs
jdbc
reporting
bigdata
data-visualization
business-intelligence
sql-editor
-
Updated
Mar 31, 2021 - Java
k82cn
commented
Dec 28, 2020
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
Automatically set GOMAXPROCS to match Linux container CPU quota, xref https://github.com/uber-go/automaxprocs
GoEddie
commented
Dec 30, 2019
This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features
Bucketizer has been implemented in dotnet/spark#378 but there are more features that should be implemented.
- Feature Extractors
- TF-IDF
- Word2Vec (dotnet/spark#491)
- CountVectorizer (https://github.com/dotnet/spark/p
Distributed Big Data Orchestration Service
java
distributed-systems
cloud
microservices
big-data
spring-boot
microservice
bigdata
configuration
orchestration
configuration-management
netflixoss
netflix-oss
-
Updated
Apr 1, 2021 - Java
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
-
Updated
Mar 31, 2021 - C++
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
python
data-science
machine-learning
big-data
spark
notebook
ipython
bigdata
ipython-notebook
pyspark
mllib
data-analysis
-
Updated
Apr 7, 2021 - Jupyter Notebook
The Programming Language Designed For Big Data and AI
-
Updated
Apr 6, 2021 - JavaScript
data-science
machine-learning
spark
bigdata
data-transformation
pyspark
data-extraction
data-analysis
data-wrangling
dask
data-exploration
data-preparation
data-cleaning
data-profiling
data-cleansing
big-data-cleaning
data-cleaner
cudf
dask-cudf
-
Updated
Apr 8, 2021 - Jupyter Notebook
Google, Naver multiprocess image web crawler (Selenium)
python
crawler
google
deep-learning
bigdata
thread
selenium
chromedriver
customizable
image-crawler
multiprocess
-
Updated
Dec 23, 2020 - Python
C# and F# language binding and extensions to Apache Spark
streaming
spark
apache-spark
csharp
fsharp
bigdata
dataset
spark-streaming
eventhubs
mapreduce
dataframe
rdd
dstream
mobius
kafka-streaming
near-real-time
-
Updated
Jan 29, 2021 - C#
[DEPRECATED] Detect threats with log data and improve cloud security posture
react
python
go
graphql
aws
security
typescript
serverless
etl
bigdata
compliance
security-automation
auto-remediation
-
Updated
Apr 6, 2021 - Go
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
-
Updated
Feb 15, 2021 - Go
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
-
Updated
Mar 29, 2021 - Python
Lightweight real-time big data streaming engine over Akka
-
Updated
Mar 19, 2021 - Scala
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
-
Updated
Oct 10, 2020 - Jupyter Notebook
A book about running Elasticsearch
-
Updated
Mar 17, 2021
Fast topic modeling platform
python
c-plus-plus
machine-learning
text-mining
bigdata
topic-modeling
python-api
bigartm
regularizer
-
Updated
Nov 26, 2020 - C++
Data syncing in golang for ClickHouse.
-
Updated
Apr 8, 2021 - Go
Improve this page
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."
Hello,
Considering your amazing efficiency on pandas, numpy, and more, it would seem to make sense for your module to work with even bigger data, such as Audio (for example .mp3 and .wav). This is something that would help a lot considering the nature audio (ie. where one of the lowest and most common sampling rates is still 44,100 samples/sec). For a use case, I would consider vaex.open('Hu