Streaming Big Data with Spark Streaming, Scala, and Spark 3!

Hands-on examples of processing massive streams of data - in real time, on a cluster - with Apache Spark Streaming.

All Levels 4.4(2,340 Ratings) 16,740 Students enrolled
Created by Sundog Education Last updated 12/2019 English English [Auto-generated]
What will i learn?
  • Process massive streams of real-time data using Spark Streaming
  • Integrate Spark Streaming with data sources, including Kafka, Flume, and Kinesis
  • Use Spark 2's Structured Streaming API
  • Create Spark applications using the Scala programming language
  • Output transformed real-time data to Cassandra or file systems
  • Integrate Spark Streaming with Spark SQL to query streaming data in real time
  • Train machine learning models with streaming data, and use those models for real-time predictions
  • Ingest Apache access log data and transform streams of it
  • Receive real-time streams of Twitter feeds
  • Maintain stateful data across a continuous stream of input data
  • Query streaming data across sliding windows of time

Curriculum for this course
38 Lectures 06:14:14
Getting Started
3 Lectures 00:33:48
  • Introduction, and Getting Set Up 00:17:27
  • [Activity] Stream Live Tweets with Spark Streaming! 00:14:11
  • Udemy 101: Getting the Most From This Course 00:02:10
  • Tip: Apply for a Twitter Developer Account now! 00:00:53
  • [Activity] Scala Basics: Part 1 00:11:26
  • [Exercise] Scala Basics: Part 2 00:09:41
  • [Exercise] Flow Control in Scala 00:07:18
  • [Exercise] Functions in Scala 00:08:47
  • [Excercise] Data Structures in Scala 00:16:38
  • Introduction to Spark 00:07:06
  • The Resilient Distributed Dataset (RDD) 00:10:40
  • [Activity] RDD's in action: simple word count application 00:08:17
  • Introduction to Spark Streaming 00:06:32
  • [Activity] Revisiting the PrintTweets application 00:05:10
  • Windowing: Aggregating data over longer time spans 00:05:00
  • Fault Tolerance in Spark Streaming 00:06:06
  • [Exercise] Saving Tweets to Disk 00:13:24
  • [Exercise] Tracking the Average Tweet Length 00:08:22
  • [Exercise] Tracking the Most Popular Hashtags 00:14:50
  • [Exercise] Tracking the Top URL's Requested 00:13:27
  • [Exercise] Alarming on Log Errors 00:11:56
  • [Exercise] Integrating Spark Streaming with Spark SQL 00:15:03
  • Intro to Structured Streaming in Spark 2 00:08:27
  • [Activity] Analyzing Apache Log files with Structured Streaming 00:11:24
  • Integrating with Apache Kafka 00:12:20
  • Integrating with Apache Flume 00:08:51
  • Integrating with Amazon Kinesis 00:05:29
  • [Activity] Writing Custom Data Receivers 00:06:55
  • Integrating with Cassandra 00:07:35
  • [Exercise] Stateful Information in Spark Streams 00:15:07
  • [Activity] Streaming K-Means Clustering 00:15:36
  • [Activity] Streaming Linear Regression 00:11:50
  • [Activity] Running with spark-submit 00:10:37
  • [Activity] Packaging your code with SBT 00:17:17
  • Running on a real Hadoop cluster with EMR 00:12:56
  • Troubleshooting and Tuning Spark Jobs 00:12:35
  • Learning More 00:03:44
  • Bonus Lecture: More courses to explore! 00:01:06
Requirements
+ View more
Description

New! Updated for Spark 3.0.0!

"Big Data" analysis is a hot and highly valuable skill. Thing is, "big data" never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created - why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it's clickstream data from a big website, sensor data from a massive "Internet of Things" deployment, financial data, or something else - Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time.

You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You'll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we'll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too.

Across over 30 lectures and almost 6 hours of video content, you'll:

  • Get a crash course in the Scala programming language

  • Learn how Apache Spark operates on a cluster

  • Set up discretized streams with Spark Streaming and transform them as data is received

  • Use structured streaming to stream into dataframes in real-time

  • Analyze streaming data over sliding windows of time

  • Maintain stateful information across streams of data

  • Connect Spark Streaming with highly scalable sources of data, including Kafka, Flume, and Kinesis

  • Dump streams of data in real-time to NoSQL databases such as Cassandra

  • Run SQL queries on streamed data in real time

  • Train machine learning models in real time with streaming data, and use them to make predictions that keep getting better over time

  • Package, deploy, and run self-contained Spark Streaming code to a real Hadoop cluser using Amazon Elastic MapReduce.

This course is very hands-on, filled with achievable activities and exercises to reinforce your learning. By the end of this course, you'll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You'll be surprised at how easy Spark Streaming makes it!

+ View more
Other related courses
01:18:37
4.1 12 Free
16:14:05
4.2 7 Free
02:34:28
4.2 9 Free
02:31:54
4.1 22 Free
01:07:11
4.5 4 Free
00:50:37
Updated Sun, 01-Dec-2019
3.7 8 Free
07:35:41
3.3 3 $199.99 Free
00:39:04
3.3 1 $24.99 Free
00:53:26
3.3 4 $24.99 Free
00:44:58
3.3 1 $19.99 Free
01:40:38
3.5 4 $24.99 Free
05:27:16
4.1 7 Free
06:33:26
4.3 1 Free
04:13:08
Updated Wed, 13-Feb-2019
4.2 9 Free
06:30:26
3.9 9 Free
01:58:15
3.4 9 Free
21:54:46
Updated Wed, 15-Jan-2020
0 19 Free
14:07:23
4.4 20 $194.99 Free
19:07:55
4.6 7 $199.99 Free
05:13:23
4.4 9 $149.99 Free
06:49:21
Updated Mon, 05-Aug-2019
4.4 9 $99.99 Free
06:10:19
4.5 8 $119.99 Free
09:00:01
4.6 16 $179.99 Free
06:13:53
4 10 $29.99 Free
07:24:21
4 14 $29.99 Free
06:06:44
3.7 10 $29.99 Free
04:58:02
4.5 10 $29.99 Free
00:56:11
Updated Sun, 19-Jan-2020
4.3 3 Free
00:50:04
4 2 Free
04:27:41
3.8 15 $79.99 Free
00:45:48
0 1 Free
00:57:23
Updated Wed, 22-Jan-2020
3.6 1 Free
04:52:32
4.3 9 Free
00:35:32
0 0 $19.99 Free
02:27:26
0 5 Free
06:21:15
3.6 8 $99.99 Free
01:55:14
Updated Sun, 03-Dec-2017
3.9 0 $199.99 Free
05:47:21
Updated Fri, 30-Nov-2018
3.7 4 $99.99 Free
03:33:34
Updated Fri, 05-Oct-2018
3 7 $99.99 Free
About the instructor
  • 45 Students
  • 3 Courses
+ View more
Founder, Sundog Education. Machine Learning Pro

Sundog Education's mission is to make highly valuable career skills in big data, data science, and machine learning accessible to everyone in the world. Our consortium of expert instructors shares our knowledge in these emerging fields with you, at prices anyone can afford. 

Sundog Education is led by Frank Kane and owned by Frank's company, Sundog Software LLC. Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Due to our volume of students we are unable to respond to private messages; please post your questions within the Q&A of your course. Thanks for understanding.

Student feedback
4.4
Average rating
  • 1%
  • 2%
  • 16%
  • 67%
  • 112%
Comments
Free $149.99
Includes:
  • 06:14:14 On demand videos
  • 38 Lessons
  • Full lifetime access
  • Access on mobile and tv