Shared Variables in Spark
Learn how Spark makes data sharing and information gathering efficient.
In addition to RDDs, Spark's second abstraction is distributed shared variables. We might want to send static data to all the workers (driver-to-worker information flow) or might want to collect some state from all the workers (workers-to-driver information flow). Spark's shared variable abstraction helps with both of these scenarios.
Shared variables
Setup work is required for some operations, like creating a random number from a specific distribution, for each partition. The user will have to create and send it to the worker with specific partitions every time ...