Hadoop is one of the most commonly used Big Data frameworks, supporting the processing of large data sets in a distributed computing environment. This tool is becoming more and more essential to big business as the world becomes more data-driven. In this introduction, you’ll cover the individual components of Hadoop in detail and get a higher level picture of how they interact with one another. It’s an excellent first step towards mastering Big Data processes.
- Access 30 lectures & 5 hours of content 24/7
- Install Hadoop in Standalone, Pseudo-Distributed, & Fully Distributed mode
- Set up a Hadoop cluster using Linux VMs
- Build a cloud Hadoop cluster on AWS w/ Cloudera Manager
- Understand HDFS, MapReduce, & YARN & their interactions
You see recommendation algorithms all the time, whether you realize it or not. Whether it’s Amazon recommending a product, Facebook recommending a friend, Netflix, a new TV show, recommendation systems are a big part of internet life. This is done by collaborative filtering, something you can perform through MapReduce with data collected in Hadoop. In this course, you’ll learn how to do it.
- Access 4 lectures & 1 hour of content 24/7
- Master the art of “thinking parallel” to break tasks into MapReduce transformations
- Use Hadoop & MapReduce to implement a recommendations algorithm
- Recommend friends on a social networking site using a MapReduce collaborative filtering algorithm
For Big Data engineers and data analysts, HBase is an extremely effective databasing tool for organizing and manage massive data sets. HBase allows an increased level of flexibility, providing column oriented storage, no fixed schema and low latency to accommodate the dynamically changing needs of applications. With the 25 examples contained in this course, you’ll get a complete grasp of HBase that you can leverage in interviews for Big Data positions.
- Access 41 lectures & 4.5 hours of content 24/7
- Set up a database for your application using HBase
- Integrate HBase w/ MapReduce for data processing tasks
- Create tables, insert, read & delete data from HBase
- Get a complete understanding of HBase & its role in the Hadoop ecosystem
- Explore CRUD operations in the shell, & with the Java API
The best way to learn is by example, and in this course you’ll get the lowdown on Scala with 65 comprehensive, hands-on examples. Scala is a general-purpose programming language that is highly scalable, making it incredibly useful in building programs. Over this immersive course, you’ll explore just how Scala can help your programming skill set, and how you can set yourself apart from other programmers by knowing this efficient tool.
- Access 67 lectures & 6.5 hours of content 24/7
- Use Scala w/ an intermediate level of proficiency
- Read & understand Scala programs, including those w/ highly functional forms
- Identify the similarities & differences between Java & Scala to use each to their advantages
The functional programming nature and the availability of a REPL environment make Scala particularly well suited for a distributed computing framework like Spark. Using these two technologies in tandem can allow you to effectively analyze and explore data in an interactive environment with extremely fast feedback. This course will teach you how to best combine Spark and Scala, making it perfect for aspiring data analysts and Big Data engineers.
- Access 51 lectures & 8.5 hours of content 24/7
- Use Spark for a variety of analytics & machine learning tasks
- Understand functional programming constructs in Scala
- Implement complex algorithms like PageRank & Music Recommendations
- Work w/ a variety of datasets from airline delays to Twitter, web graphs, & Product Ratings
- Use the different features & libraries of Spark, like RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming, & GraphX
- Write code in Scala REPL environments & build Scala applications w/ an IDE
Linear Regression is a powerful method for quantifying the cause and effect relationships that affect different phenomena in the world around us. This course will teach you how to build robust linear models that will stand up to scrutiny when you apply them to real world situations. You’ll even put what you’ve learnt into practice by leveraging Excel, R, and Python to build a model for stock returns.
- Access 40 lectures & 5 hours of content 24/7
- Cover method of least squares, explaining variance, & forecasting an outcome
- Explore residuals & assumptions about residuals
- Implement simple & multiple regression in Excel, R, & Python
- Interpret regression results & avoid common pitfalls
- Introduce a categorical variable
Factor analysis helps to cut through the clutter when you have a lot of correlated variables to explain a single effect. This course will help you understand factor analysis and its link to linear regression. You’ll explore how Principal Components Analysis (PCA) is a cookie cutter technique to solve factor extraction, and how it relates to machine learning.
- Access 19 lectures & 1.5 hours of content 24/7
- Understand principal components
- Discuss Eigen values & Eigen vectors
- Perform Eigenvalue decomposition
- Use principal components for dimensionality reduction & exploratory factor analysis
- Apply PCA to explain the returns of a technology stock like Apple
- Find the principal components & use them to build a regression model
Big data is hot, and data management and analytics skills are your ticket to a fast-growing, lucrative career. This course will quickly teach you two technologies fundamental to big data: MapReduce and Hadoop. Learn and master the art of framing data analysis problems as MapReduce problems with over 10 hands-on examples. Write, analyze, and run real code along with the instructor– both on your own system, and in the cloud using Amazon’s Elastic MapReduce service. By course’s end, you’ll have a solid grasp of data management concepts.
- Learn the concepts of MapReduce to analyze big sets of data w/ 56 lectures & 5.5 hours of content
- Run MapReduce jobs quickly using Python & MRJob
- Translate complex analysis problems into multi-stage MapReduce jobs
- Scale up to larger data sets using Amazon’s Elastic MapReduce service
- Understand how Hadoop distributes MapReduce across computing clusters
- Complete projects to get hands-on experience: analyze social media data, movie ratings & more
- Learn about other Hadoop technologies, like Hive, Pig & Spark
Hadoop is perhaps the most important big data framework in existence, used by major data-driven companies around the globe. Hadoop and its associated technologies allow companies to manage huge amounts of data and make business decisions based on analytics surrounding that data. This course will take you from big data zero to hero, teaching you how to build Hadoop solutions that will solve real world problems – and qualify you for many high-paying jobs.
- Access 43 lectures & 10 hours of content 24/7
- Learn how technologies like Mapreduce apply to clustering problems
- Parse a Twitter stream Python, extract keywords w/ Apache Pig, visualize data w/ NodeJS, & more
- Set up a Kafka stream w/ Java code for producers & consumers
- Explore real-world applications by building a relational schema for a health care data dictionary used by the US Department of Veterans Affairs
- Log collections & analytics w/ the Hadoop distributed file system using Apache Flume & Apache HCatalog
You are allowed to use this product only within the laws of your country/region. SharewareOnSale and its staff are not responsible for any illegal activity. We did not develop this product; if you have an issue with this product, contact the developer. This product is offered "as is" without express or implied or any other type of warranty. The description of this product on this page is not a recommendation, endorsement, or review; it is a marketing description, written by the developer. The quality and performance of this product is without guarantee. Download or use at your own risk. If you don't feel comfortable with this product, then don't download it.