RDD的五大特性

目标

  1. 整理RDD的概念&RDD的五大特性&源码体现
  2. 整理SparkContext&SparkConf

RDD的概念&RDD的五大特性及源码体现

  1. a list of partitions.
    protected def getPartitions: Array[Partition]
  2. a function for computing each split/partition.
    def compute(split: Partition, context: TaskContext): Iterator[T]
  3. a list of dependencies on other RDDs.
    RDDA => RDDB => RDDC => RDDD
    protected def getDependencies: Seq[Dependency[_]] = deps
  4. optionally, a partitioner for key-value RDDs.
    @transient val partitioner: Option[Partitioner] = None
  5. optionally, a list of preferred locations to compute each split on.
    preferred locations 最佳位置( 数据本地性)
    protected def getPreferredLocations(split: Partition): Seq[String] = Nil

SparkContext&SparkConf详解

Author: Red
Link: http://yoursite.com/2018/11/03/archives/spark/spark-1/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.