Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates Spark configuration heuristic severity calculations #229

Merged
merged 7 commits into from
Apr 20, 2017

Conversation

shkhrgpt
Copy link
Contributor

As discussed in document, updating the spark configuration heuristic. The main changes are based on the following logics:

  • Serializer: Suggest using Kryo serializer because its higher performance than the Java Serializer. Maybe a moderate severity if not using Kryo. Config key spark.serializer should be set to org.apache.spark.serializer.KryoSerializer

  • Shuffle Service: If not set -> Moderate -> Suggest configuring a Shuffle service so that jobs can survive an executor failure. spark.shuffle.service.enabled should be set to true
    If Spark Dynamic Allocation is on, and no Shuffle Service configured then Severe severity.

Also updates unit tests.

@shankar37 @akshayrai @superbobry

@@ -128,18 +135,29 @@ object ConfigurationHeuristic {
lazy val serializer: Option[String] = getProperty(SPARK_SERIALIZER_KEY)

lazy val serializerSeverity: Severity = serializer match {
case None => Severity.NONE
case None => Severity.MODERATE
case Some(`serializerIfNonNullRecommendation`) => Severity.NONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not part of your PR, but it looks suspicious nonetheless: IIUC the third branch is unreachable because the pattern Some(`foo`) is equavalent to Some(_) modulo variable name.

scala> Some(42) match {
     | case Some(`foo`) => "first"
     | case Some(_) => "second"
     | }
res1: String = first

Copy link
Contributor Author

@shkhrgpt shkhrgpt Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that's true, otherwise the unit tests will fail. I think it all depends on the value of foo. I tried the code locally, and got the following outputs:

scala> val foo = 50
foo: Int = 50

scala>     Some(42) match {
     |       case Some(`foo`) => "first" 
     |       case Some(_) => "second"
     |     }
res1: String = second


scala> val foo = 42
foo: Int = 42

scala>     Some(42) match {
     |       case Some(`foo`) => "first"
     |       case Some(_) => "second"
     |     }
res2: String = first

I am going to leave this code unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, evil Scala...

case Some(_) => DEFAULT_SERIALIZER_IF_NON_NULL_SEVERITY_IF_RECOMMENDATION_UNMET
}

lazy val isDynamicAllocationEnabled: Option[Boolean] = getProperty(SPARK_DYNAMIC_ALLOCATION_ENABLED).map(_.toBoolean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not applying defaults at declaration site? I might be wrong, but it looks like in the following match statement you treat None as false for both properties.

Here's a rewritten match:

 lazy val shuffleAndDynamicAllocationSeverity = (isDynamicAllocationEnabled, isShuffleServiceEnabled) match {
   case (_, Some(true)) | (None, Some(true)) => Severity.NONE
   case (Some(true), Some(false)) | (Some(true), None) => Severity.SEVERE
   case (Some(false), Some(false)) => Severity.MODERATE
   case (Some(false), None) => Severity.MODERATE
   case (None, Some(false)) => Severity.MODERATE
   case (None, None) => Severity.MODERATE
 }

The first case is essentially

case (_, true) => Severity.None

the next bunch is

case (true, false) => Severity.SEVERE

and finally

case (false, false) => Severity.MODERATE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion of using default value instead of None. It simplified the pattern matching.

lazy val isDynamicAllocationEnabled: Option[Boolean] = getProperty(SPARK_DYNAMIC_ALLOCATION_ENABLED).map(_.toBoolean)
lazy val isShuffleServiceEnabled: Option[Boolean] = getProperty(SPARK_SHUFFLE_SERVICE_ENABLED).map(_.toBoolean)

lazy val shuffleAndDynamicAllocationSeverity = (isDynamicAllocationEnabled, isShuffleServiceEnabled) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to have a brief comment here outlining the rationale. The reader could of course trace it back to that commit, but a comment is more reader-friendly in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@superbobry
Copy link
Contributor

I don't have an opinion on dynamic allocation/shuffle service, but the Kryo heuristic looks very useful 👍 .

@shkhrgpt
Copy link
Contributor Author

Thanks @superbobry for the review.

@@ -77,6 +74,14 @@ class ConfigurationHeuristic(private val heuristicConfigurationData: HeuristicCo
new HeuristicResultDetails(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add recommendations to the heuristic details. If multiple of them fails, we take the max and user has to fix one to know there is more to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankar37 I did not understand this comment. Where should I add recommendations to the heuristic details?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all configs that we are checking, we should add a detail like of key-value pair like

whenever we find that check not yielding none.
thanks
shankar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankar37 I am sorry, I still don't understand your comment. Can you please give an example?
Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankar37 Can you please provide an example here. I am not able to understand your comment. Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed your reply. An example would be

if( serializer check fails)
HeuristicResult.add( new HeuristicResultDetails(
"Serializer",
"KyroSerializer Not Enabled.") )
if( dynamicallocation check fails) {
HeuristicResult.add (
new HeuristicResultDetails(
"DynamicAllocation without Shuffle",
"DynamicAllocation is enabled but."))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why do we need to have an if statement here, we don't have it in any other case. I am already adding details like the following:

new HeuristicResultDetails(
SPARK_DYNAMIC_ALLOCATION_ENABLED,
formatProperty(evaluator.isDynamicAllocationEnabled.map(.toString))
),
new HeuristicResultDetails(
SPARK_SHUFFLE_SERVICE_ENABLED,
formatProperty(evaluator.isShuffleServiceEnabled.map(
.toString))
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the scenario. If I have a job that fails both dynamic allocation check and kyroserializer check you will take the max of severity and show that. Let's say that is dynamic allocation check. Once that is fixed, they will rerun the job and find it failing the kyroserializer check now. I am asking to make it obvious to the user they have multiple issues to fix within this one heuristic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankar37 I think I got it. Thank you very much. I've updated the PR. Plese have a look.

@akshayrai
Copy link
Contributor

+1 LGTM

@akshayrai akshayrai merged commit 7c373d4 into linkedin:master Apr 20, 2017
@akshayrai
Copy link
Contributor

Thanks @shkhrgpt.

skakker pushed a commit to skakker/dr-elephant that referenced this pull request Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants