Apache Spark is a distributed open-source, general-purpose framework for clustered computing. It is designed with computational speed in mind, from machine learning to stream processing to complex SQL queries. It can easily process and distribute work on large datasets across multiple computers.
Further, it employs in-memory cluster computing to increase the applications’ speed by reducing the need to write to disk. Spark provides APIs for multiple programming languages including Python, R, and Scala. These APIs abstract away the lower-level work that might otherwise be required to handle big data.
In this video, we demonstrate how to install Apache Spark on Ubuntu 18.04.
apt update -y
apt install default-jdk -y
tar -xvzf spark-*
mv spark-3.0.1-bin-hadoop2.7/ /opt/spark
echo "export SPARK_HOME=/opt/spark" >> ~/.profile
echo "export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin" >>~/.profile
echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
ssh -L 8080:localhost:8080 firstname.lastname@example.org
Video by: Justin Palmer
About the AuthorMore Content by Liquid Web