Skip to content

Latest commit

 

History

History

Lecture 12: Spark Case Study

预习

Resilient Distributed Datasets: A Fault-Tolerant Abstraction forIn-Memory Cluster Computing

FAQ

  1. Is Spark currently in use in any major applications?
  2. How common is it for PhD students to create something on the scale of Spark?
  3. Should we view Spark as being similar to MapReduce?
  4. Why are RDDs called immutable if they allow for transformations?
  5. Do distributed systems designers worry about energy efficiency?
  6. How do applications figure out the location of an RDD?
  7. How does Spark achieve fault tolerance?
  8. Why is Spark developed using Scala? What's special about the language?
  9. Does anybody still use MapReduce rather than Spark, since Spark seems to be strictly superior? If so, why do people still use MR?
  10. Is the RDD concept implemented in any systems other than Spark?

上课

讲义

FAQ 答案

作业

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing What applications can Spark support well that MapReduce/Hadoop cannot support?

LAB 4

[LAB 4 说明](6.824 Lab 4_ Sharded Key_Value Service.html)