Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,132 public repositories matching this topic...

dbczumar
dbczumar commented Sep 18, 2021

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We're seeking help with the implementation of roadmap items tagged with the help wanted label.

For requirements clarifications and implementation questions, or to request a PR review, please tag @BenWilson2 in your communications related to this issue.

Proposal Summary

Includ

lakeFS
ozkatz
ozkatz commented Nov 7, 2021

What

being able to take a data object (or prefix, like a partition) and get back the commit that added/modified it.

Why

This is valuable lineage information that is currently available in lakeFS but not exposed easily, and mimics the behavior of git blame

How

Given the lakeFS API already supports listing the log of commits for an object or prefix (🎉), this could be a `

Created by Matei Zaharia

Released May 26, 2014

Repository
apache/spark
Website
spark.apache.org
Wikipedia
Wikipedia

Related Topics

hadoop scala