...
For more information about Spark and PySpark, seeyou can visit the following resources:
https://en.wikipedia.org/wiki/Apache_Spark
...
You can access and start using PySpark by using with the following steps:
- Connect to the Tufts High Performance Compute Cluster. SeeĀ Connecting for a detailed guide.
Load the Spark module by typingwith the following command:
Code Block module load spark
Note that you can see a list of all available modules (potentially including different versions of Spark) by typing:
Code Block module avail
You can specify a specific version of Spark with theĀ module load command or use the generic module name (spark) to load the latest version.
Start PySpark session by typing:
Code Block pyspark