[SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics. #28994

weixiuli · 2020-07-04T11:20:39Z

What changes were proposed in this pull request?

Improve the speculation for the inefficient tasks by the task metrics.

Why are the changes needed?

Tasks will be speculated when meet certain conditions no matter they are inefficient or not，this would be a huge waste of cluster resources.
In production, the speculation task comes from an efficient one will be killed finally, which is unnecessary and wastes the cluster resources. Sometimes, it interferes with other task scheduling.
So, we should evaluate whether the task is inefficient by success tasks metrics firstly, and then decide to speculate it or not. The inefficient task should be speculated and the efficient one should not, it is better for the cluster resources.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add UT.

…s by the task metrics.

weixiuli · 2020-07-04T11:23:34Z

@cloud-fan @dongjoon-hyun kindly review, thanks.

AmplabJenkins · 2020-07-05T19:58:05Z

Can one of the admins verify this patch?

mridulm · 2020-07-08T20:30:47Z

@venkata91 You might be interested in this.

weixiuli · 2020-07-13T06:27:34Z

@maropu @cloud-fan @gatorsmile @mridulm @dongjoon-hyun Could you help check this PR? Thanks.

venkata91 · 2020-07-17T05:38:38Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

+      }.map(_.taskMetrics).filter(_.isDefined).map(_.get).foreach { task =>
+        if (task.inputMetrics != null) {
+          sumInputRecords += task.inputMetrics.recordsRead
+        }


how about recordsWritten? Should that also be considered wrt progress same wrt shuffleRecordsWritten?

Even cache can also take time when written to disk, does that need to be taken into consideration? Similarly GC time, shuffle read blocked time etc. could also impact task progress

venkata91 · 2020-07-17T05:40:13Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

+      } else if (taskData != null && taskData.contains(tid) && taskData(tid) != null &&
+        taskData(tid).taskMetrics.isDefined) {
+        val taskMetrics = taskData(tid).taskMetrics.get
+        val currentTaskProgressRate = (taskMetrics.inputMetrics.recordsRead +


would it make sense to add taskProgress as part of taskMetrics that way it can also be shown in SparkUI? Although taskProgress for tasks which doesn't involve input/output/shuffle records would be hard to measure?

venkata91 · 2020-07-17T05:43:54Z

This is an interesting idea and a good start. Just considering the runTime of a task alone might not be useful in many cases. Thanks!

github-actions · 2020-10-26T00:58:03Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-32170][CORE] Improve the speculation for the inefficient task…

7519ef5

…s by the task metrics.

probot-autolabeler bot added the CORE label Jul 4, 2020

weixiuli marked this pull request as draft July 4, 2020 11:21

venkata91 reviewed Jul 17, 2020

View reviewed changes

github-actions bot added the Stale label Oct 26, 2020

github-actions bot closed this Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics. #28994

[SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics. #28994

Uh oh!

weixiuli commented Jul 4, 2020

Uh oh!

weixiuli commented Jul 4, 2020 •

edited

Loading

Uh oh!

AmplabJenkins commented Jul 5, 2020

Uh oh!

mridulm commented Jul 8, 2020

Uh oh!

weixiuli commented Jul 13, 2020

Uh oh!

venkata91 Jul 17, 2020

Uh oh!

venkata91 Jul 17, 2020

Uh oh!

venkata91 Jul 17, 2020

Uh oh!

venkata91 commented Jul 17, 2020 •

edited

Loading

Uh oh!

github-actions bot commented Oct 26, 2020

Uh oh!

Uh oh!

[SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics. #28994

[SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics. #28994

Uh oh!

Conversation

weixiuli commented Jul 4, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

weixiuli commented Jul 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmplabJenkins commented Jul 5, 2020

Uh oh!

mridulm commented Jul 8, 2020

Uh oh!

weixiuli commented Jul 13, 2020

Uh oh!

venkata91 Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

venkata91 Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

venkata91 Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

venkata91 commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 26, 2020

Uh oh!

Uh oh!

weixiuli commented Jul 4, 2020 •

edited

Loading

venkata91 commented Jul 17, 2020 •

edited

Loading