Introduction to Python
Two Day Course
Intro to Machine Learning
Three Day Course
WHAT IS THIS COURSE ABOUT?
This course focuses on python for data science. You will learn about Python
programming interfaces, how data stored and referred how to import various data formats in Python. You will learn various data preparation and explorations tasks through pandas and numpy libraries. Finally, you will learn to visualize data using matplotlib library.
WHO SHOULD ATTEND?
This is a level one program. Anyone who wants to enter into analytics/ data science or R/SAS data science programmer who wants to enrich with one more programming language.
PREPARATION
You must bring your own laptop, 64 bit, at least i3 core processor (i5 preferred).
Instructions on installation of anaconda for Python would be sent you before the workshop. You need to follow those instructions to make your laptop ready before the class.
COURSE OUTLINE:
Getting started with Python
•Introduction to Python Programming Interfaces
•Understanding data types
•Understanding data structures
Importing data in Python
•Flat files
•Other files
•Relational databases
•Web
Data Preparation
•Foundation of pandas
•Reshape, rearrange, transform
•Cleaning
•Combining
•Data pre-processing
Data Exploration
•Numeric statistics with panda and numpy
Visualization in Python using Matplotlib
•Customizing plots
•Statistical plots
WHAT IS THIS COURSE ABOUT?
Machine learning is the science of making computer machine self learn from the past experiences to improve on the task's outcome. There are many exciting real-life examples of the application of machine learning such as robotic vacuum cleaner,
self-driving cars, face or speech recognition, effective web search and many more. In this course, you will learn important machine learning concepts, types of machine learning algorithms, steps in the model building, testing and scoring
and learn various packages and functionalities in Python. You will get introduced to various supervised and unsupervised learning algorithms, you learn more complex linear, nonlinear as well as ensemble machine learning techniques. Enough time is spent on understanding the concept behind each algorithm
within case studies. What’s more? This course will also introduce deep learning.
WHO SHOULD ATTEND?
This is level two program. If you already familiar with basic math concepts like linear algebra, matrices and stats concepts like probability, estimation, this course would be easier for you to grasp. If you are interested to know more on what machine is learning, how to use machine learning, types of machine learning, and implementation of various algorithms to solve some of the business problems then, you should attend this two days course. This course
expects some Python knowledge.
PREPARATION
You must bring your own laptop, 64 bit, at least i3 core processor (i5 preferred) with NVIDIA GPU. Instructions on installation of anaconda for Python would be
sent you before the workshop. You need to follow those instructions to make your laptop ready before the class.
COURSE OUTLINE:
Introduction to Machine Learning
•Intro to machine learning
•Types of machine learning algorithms
•Understanding the basics
• Steps in the model building, testing and scoring
•Model Evaluation Metrics
Supervised learning algorithms
•Understanding Linear and non-linear algorithms
•Understanding Regression and Classification
•Linear Algorithms
•Simple linear regression
•Multiple linear regression
•Logistic regression
Non-linear Algorithms
•k nearest neighbors
•Classification and regression trees
•Naïve byes
•Neural networks
•Ensemble Algorithms
•Random forest
•Gradient boosting
•Unsupervised learning algorithms
•Understanding Clustering and Association
*k -means Clustering
*MBA using Word2Vec
•Introduction to Deep Learning
Cloudera Data Analyst Training
Four Day Course
Apache Hadoop Fundamentals
-
The Motivation for Hadoop
-
Hadoop Overview
-
Data Storage: HDFS
-
Distributed Data Processing: YARN, MapReduce, and Spark
-
Data Processing and Analysis: Pig, Hive, and Impala
-
Database Integration: Sqoop
-
Other Hadoop Data Tools
-
Exercise Scenarios
Introduction to Apache Pig
-
What is Pig?
-
Pig’s Features
-
Pig Use Cases
-
Interacting with Pig
Basic Data Analysis with Apache Pig
-
Pig Latin Syntax
-
Loading Data
-
Simple Data Types
-
Field Definitions
-
Data Output
-
Viewing the Schema
-
Filtering and Sorting Data
-
Commonly Used Functions
Processing Complex Data with Apache Pig
-
Storage Formats
-
Complex/Nested Data Types
-
Grouping
-
Built-In Functions for Complex Data
-
Iterating Grouped Data.
Multi-Dataset Operations with Apache Pig
-
Techniques for Combining Datasets
-
Joining Datasets in Pig
-
Set Operations
-
Splitting Datasets
Apache Pig Troubleshooting and Optimization
-
Troubleshooting Pig
-
Logging
-
Using Hadoop’s Web UI
-
Data Sampling and Debugging
-
Performance Overview
-
Understanding the Execution Plan
-
Tips for Improving the Performance of Pig Jobs
Introduction to Apache Hive and Impala
-
What is Hive?
-
What is Impala?
-
Why Use Hive and Impala?
-
Schema and Data Storage
-
Comparing Hive and Impala to Traditional Databases
-
Use Cases
Querying with Apache Hive and Impala
-
Databases and Tables
-
Basic Hive and Impala Query Language Syntax
-
Data Types
-
Using Hue to Execute Queries
-
Using Beeline (Hive’s Shell)
-
Using the Impala Shell
Apache Hive and Impala Data Management
-
Data Storage
-
Creating Databases and Tables
-
Loading Data
-
Altering Databases and Tables
-
Simplifying Queries with Views
-
Storing Query Results
Data Storage and Performance
-
Partitioning Tables
-
Loading Data into Partitioned Tables
-
When to Use Partitioning
-
Choosing a File Format
-
Using Avro and Parquet File Formats
Relational Data Analysis with Apache Hive and Impala
-
Joining Datasets
-
Common Built-In Functions
-
Aggregation and Windowing
Complex Data with Apache Hive and Impala
-
Complex Data with Hive
-
Complex Data with Impala
Analyzing Text with Apache Hive and Impala
-
Using Regular Expressions with Hive and Impala
-
Processing Text Data with SerDes in Hive
-
Sentiment Analysis and n-grams in Hive
Apache Hive Optimization
-
Understanding Query Performance
-
Bucketing
-
Indexing Data
-
Hive on Spark
Apache Impala Optimization
-
How Impala Executes Queries
-
Improving Impala Performance
Extending Apache Hive and Impala
-
Custom SerDes and File Formats in Hive
-
Data Transformation with
-
Custom Scripts in Hive
-
User-Defined Functions
-
Parameterized Queries
Choosing the Best Tool for the Job
-
Comparing Pig, Hive, Impala, and Relational Databases
-
Which to Choose?