Download PDF
of this course

EMC-AMDSBDA- Advanced Methods in Data Science and Big Data Analytics (EMC-AMDSBDA)

  • Overview
  • Who Should Attend
  • Certifications
  • Prerequisites
  • Objectives
  • Content
  • Schedule
Course Overview

Course Duration : 5 Days

This course takes an ‘open’ or technology-neutral approach and utilizes several open-source tools to address big data challenges. The course builds on skills developed in the Data Science and Big Data Analytics course.

 

  • Develop and execute MapReduce functionality

  • Gain familiarity with NoSQL databases and Hadoop Ecosystem tools for analyzing large-scale, unstructured data sets

  • Develop a working knowledge of Natural Language Processing, Social Network Analysis, and Data Visualization concepts

  • Use advanced quantitative methods, and apply one of them in a Hadoop environment

  • Apply advanced techniques to real-world datasets in a final lab

Who Should Attend

Benefits aspiring Data Scientists, data analysts that have completed the Associate level Data Science and Big Data Analytics course, and computer scientists wanting to learn MapReduce and methods for analyzing unstructured data.

Course Certifications

This course is part of the following Certifications:

Prerequisites

  • Completion of the Associate-level Data Science and Big Data Analytics course
  • Proficiency in at least one programming language such as Java or Python

Course Objectives

Learn Hadoop (including Pig, Hive, and HBase), Natural Language Processing, Social Network Analysis, Simulation, Random Forests, Multinomial Logistic Regression, and Data Visualization.

Course Content

Module 1: MapReduce and Hadoop

  • The MapReduce Framework

  • Apache Hadoop

  • Hadoop Distributed File System

  • YARN

Module 2: Hadoop Ecosystem and NoSQL

  • Hadoop Ecosystem

  • Pig

  • Hive

  • NoSQL - Not only SQL

  • HBase

  • Spark

Module 3: Natural Language Processing

  • Introduction to NLP

  • Text Preprocessing

  • TFIDF

  • Beyond Bag of Words

  • Language Modeling

  • POS Tagging and HMM

  • Sentiment Analysis and Topic Modeling

Module 4: Social Network Analysis

  • Introduction to SNA and Graph Theory

  • Most Important Nodes

  • Communities and Small World

  • Network Problems and SNA Tools

Module 5: Data Science Theory and Methods

  • Simulation

  • Random Forests

  • Multinomial Logistic Regression

Module 6: Data Visualization

  • Perception and Visualization

  • Visualization of Multivariate Data

Course ID: EMC-AMDSBDA


Show Schedule for 1 Month  3 Months  All 
Date Country Location Register