Home » Apache Spark

Tag Archives: Apache Spark

Apache Kafka Integration With Spark

1. Introduction This is an in-depth article related to Apache Kafka and Spark Integration. Apache Kafka is an Apache open-source project. It was initially created on Linkedin. Kafka framework was created in java and scala. It supports publish-subscribe messaging and is fault-tolerant. It is scalable and performs for high-volume messaging. Zookeeper is the basic component that manages the Apache Kafka ...

Read More »

Apache Spark Streaming Example

1. Introduction This is an article showing the Apache Spark Streaming Example. Apache Spark was created in UC Berkeley’s AMPLab in 2009. It was created by Matei Zaharia. It was open-sourced in 2010 with a BSD license. Apache acquired Spark in 2013. It became a popular project in the Apache program in 2014. Apache Spark is based on a cluster ...

Read More »

Apache Spark Architecture Tutorial

In this tutorial, we will take a look at the Apache Spark Architecture. 1. Introduction Apache Spark was created in UC Berkeley’s AMPLab in 2009. It was created by Matei Zaharia. It was open-sourced in 2010 with a BSD license. Apache acquired Spark in 2013. It became a popular project in the Apache program in 2014. Apache Spark is based ...

Read More »

Apache Spark Tutorial for Beginners

In this post, we feature a comprehensive Apache Spark Tutorial for Beginners. We will be looking at Apache Spark in detail, how is it different than Hadoop, and what are the different components that are bundled in Apache Spark. Also, we will look at RDDs, which is the heart of Spark and a simple example of RDD in java. Table ...

Read More »

Apache Spark Installation Guide

In this post, we feature a comprehensive Apache Spark Installation Guide. 1. Introduction Apache Spark is an open-source cluster computing framework with in-memory data processing engine. It provides API in Java, Scala, R, and Python. Apache Spark works with HDFS and can be up to 100 times faster than Hadoop Map-Reduce. It also supports other high-level tools like Spark-SQL for ...

Read More »

Apache Spark Machine Learning Tutorial

The article will feature a comprehensive tutorial on how to implement machine learning use cases with Apache Spark. Table Of Contents 1. What is Apache Spark ? 1.1. Features of Apache Spark 1.2. Components of Spark 1.3. Data processing with Spark 2. Machine Learning With Spark 2.1. MLLib 2.2. Anomaly Detection with Apache Spark 2.2.1. Data Preparation 2.2.2 Execution 2.2.3. ...

Read More »

The Hadoop Ecosystem Explained

In this article, we will go through the Hadoop Ecosystem and will see of what it consists and what does the different projects are able to do. 1. Introduction Apache Hadoop is an open source platform managed by Apache Foundation. It is written in Java and is able to process large amount of data (generally called Big Data) in distributed ...

Read More »