4. How DAG works in Spark?
The interpreter is the first layer, using a Scala interpreter, Spark interprets the code with some modifications.
Spark creates an operator graph when you enter your code in Spark console.
When we call an Action on Spark RDD at a high level, Spark submits the operator graph to the DAG Scheduler.
Divide the operators into stages of the task in the DAG Scheduler. A stage contains task based on the partition of the input data. The DAG scheduler pipelines operators together. For example, map operators schedule in a single stage.
The stages pass on to the Task Scheduler. It launches task through cluster manager. The dependencies of stages are unknown to the task scheduler.
The Workers execute the task on the slave.
Более подробную информацию можно получить по ссылке: https://data -flair.training / blogs / dag-in-apache-spark /