1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Nov 21, 2023 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
A Data Analysis Board in Vue.
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Powerful & Easy way for big data discovery
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
This is about learning courses in Coursera. All the answers given written by myself
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
The Pandata scalable open-source analysis stack
Big data projects implemented by Maniram yadav
Egis - a handy Ruby interface for AWS Athena
Real-time Packet Observation Tool
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
Visual, interactive queries against big databases
open source tools for interaction with IBM PAIRS:
TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks
Add a description, image, and links to the big-data-analytics topic page so that developers can more easily learn about it.
To associate your repository with the big-data-analytics topic, visit your repo's landing page and select "manage topics."