Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

PySpark is the Python interface to Apache Spark, a powerful open source cluster computing framework. Spark is a fast and general-purpose cluster computing system and provides programmers with an interface centered on the Resilient Distributed Dataset (RDD). The RDD is a data structure that is distributed over a cluster of computers and is maintained in a fault-tolerant way.

 

For more information about Spark and PySpark, see:

...