Instalación
Installing Spark on EC2¶
Spinning Up Miniconda on EC2¶
Installing JDK¶
sudo apt update
sudo apt install openjdk-11-jdk -y
Installing Spark [Be Patient, this takes time!]¶
wget https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz
tar xvf spark-3.4.0-bin-hadoop3.tgz
Changing the Path of Spark¶
sudo mv spark-3.4.0-bin-hadoop3 /opt/spark
%md
Set Path Vars¶
nano ~/.bashrc
export SPARK_HOME=/home/ubuntu/spark-3.5.4-bin-hadoop3
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$PYTHONPATH
Save and exit the file by pressing CTRL + O, then press Enter to confirm, and CTRL + X to exit.
source ~/.bashrc
In [2]:
Copied!
### Testing
import os
from pyspark.sql import SparkSession
# Set the Spark Home if not already set
os.environ['SPARK_HOME'] = '/opt/spark'
# Initialize Spark Session (ensure no spaces after the backslash)
spark = SparkSession.builder \
.appName("Jupyter Spark Example") \
.master("local[*]").config("spark.driver.memory", "2g").getOrCreate()
# Verify that Spark session is created
print(spark.version)
### Testing
import os
from pyspark.sql import SparkSession
# Set the Spark Home if not already set
os.environ['SPARK_HOME'] = '/opt/spark'
# Initialize Spark Session (ensure no spaces after the backslash)
spark = SparkSession.builder \
.appName("Jupyter Spark Example") \
.master("local[*]").config("spark.driver.memory", "2g").getOrCreate()
# Verify that Spark session is created
print(spark.version)
Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 25/02/15 09:46:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3.5.4
In [3]:
Copied!
data = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
columns = ["ID", "Name"]
# Create a DataFrame
df = spark.createDataFrame(data, columns)
df.show()
data = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
columns = ["ID", "Name"]
# Create a DataFrame
df = spark.createDataFrame(data, columns)
df.show()
+---+-------+ | ID| Name| +---+-------+ | 1| Alice| | 2| Bob| | 3|Charlie| +---+-------+
In [ ]:
Copied!