Recursos Dinámicos

Dynamic vs Static Resource Allocation¶

Two applications asking for resources at the same time

No description has been provided for this image

App 2 will not have enough resources to run in this case.

To avoid this, we use dynamic allocation where the executor resources are scaled up or down depending on its usage

Spark Dynamic Allocation Properties¶

spark.dynamicAllocation.enabled

Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload.

spark.dynamicAllocation.executorIdleTimeout

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed.

Default is 60s

Note that, under most circumstances, this condition is mutually exclusive with the request condition, in that an executor should not be idle if there are still pending tasks to be scheduled.

spark.dynamicAllocation.cachedExecutorIdleTimeout

If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed.

Default is infinity

spark.dynamicAllocation.initialExecutors

Initial number of executors to run if dynamic allocation is enabled.

If --num-executors (or spark.executor.instances) is set and larger than this value, it will be used as the initial number of executors.

Default is spark.dynamicAllocation.minExecutors

spark.dynamicAllocation.maxExecutors

Upper bound for the number of executors if dynamic allocation is enabled.

Default is infinity

spark.dynamicAllocation.executorAllocationRatio

By default, the dynamic allocation will request enough executors to maximize the parallelism according to the number of tasks to process. While this minimizes the latency of the job, with small tasks this setting can waste a lot of resources due to executor allocation overhead, as some executor might not even do any work.

This setting allows to set a ratio that will be used to reduce the number of executors w.r.t. full parallelism. Defaults to 1.0 to give maximum parallelism. 0.5 will divide the target number of executors by 2 The target number of executors computed by the dynamicAllocation can still be overridden by the spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors settings

spark.dynamicAllocation.schedulerBacklogTimeout

If dynamic allocation is enabled and there have been pending tasks backlogged for more than this duration, new executors will be requested.

Same as spark.dynamicAllocation.schedulerBacklogTimeout, but used only for subsequent executor requests.

Spark requests executors in rounds. The actual request is triggered when there have been pending tasks for spark.dynamicAllocation.schedulerBacklogTimeout seconds, and then triggered again every spark.dynamicAllocation.sustainedSchedulerBacklogTimeout seconds thereafter if the queue of pending tasks persists. Additionally, the number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds.

spark.dynamicAllocation.shuffleTracking.enabled

Enables shuffle file tracking for executors, which allows dynamic allocation without the need for an external shuffle service. This option will try to keep alive executors that are storing shuffle data for active jobs.

spark.dynamicAllocation.shuffleTracking.timeout

When shuffle tracking is enabled, controls the timeout for executors that are holding shuffle data. The default value means that Spark will rely on the shuffles being garbage collected to be able to release executors. If for some reason garbage collection is not cleaning up shuffles quickly enough, this option can be used to control when to time out executors even when they are storing shuffle data.

After 60 seconods all the executors are killed

Sometiems the executors take time to be killed due to garbage collection process

In [ ]: