PySpark: Eclipse Integration

(Last Updated On: )

This tutorial will guide you through configuring PySpark on Eclipse.

First you need to install Eclipse.

You need to add “pyspark.zip” and “py4j-0.10.7-src.zip” to “Libraries” for the Python Interpreter.

Next you need to configure the Environment variables for PySpark.

Test that it works!

from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession

def init_spark():
    spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
    sc = spark.sparkContext
    return spark,sc

if __name__ == '__main__':
    spark,sc = init_spark()
    nums = sc.parallelize([1,2,3,4])
    print(nums.map(lambda x: x*x).collect())