目标
- 整理RDD的概念&RDD的五大特性&源码体现
- 整理SparkContext&SparkConf
RDD的概念&RDD的五大特性及源码体现
- a list of partitions.
protected def getPartitions: Array[Partition]
- a function for computing each split/partition.
def compute(split: Partition, context: TaskContext): Iterator[T]
- a list of dependencies on other RDDs.
RDDA => RDDB => RDDC => RDDDprotected def getDependencies: Seq[Dependency[_]] = deps
- optionally, a partitioner for key-value RDDs.
val partitioner: Option[Partitioner] = None
- optionally, a list of preferred locations to compute each split on.
preferred locations 最佳位置( 数据本地性)protected def getPreferredLocations(split: Partition): Seq[String] = Nil