top of page
Introduction to Python

Two Day Course

Intro to Machine Learning

Three Day Course



This course focuses on python for data science. You will learn about Python
programming interfaces, how data stored and referred how to import various data formats in Python. You will learn various data preparation and explorations tasks through pandas and numpy libraries. Finally, you will learn to visualize data using matplotlib library.



This is a level one program. Anyone who wants to enter into analytics/ data science or R/SAS data science programmer who wants to enrich with one more programming language.


You must bring your own laptop, 64 bit, at least i3 core processor (i5 preferred).

Instructions on installation of anaconda for Python would be sent you before the workshop. You need to follow those instructions to make your laptop ready before the class.


Getting started with Python 
•Introduction to Python Programming Interfaces
•Understanding data types
•Understanding data structures
Importing data in Python
•Flat files
•Other files
•Relational databases
Data Preparation
•Foundation of pandas
•Reshape, rearrange, transform
•Data pre-processing 
Data Exploration  
•Numeric statistics with panda and numpy
Visualization in Python using Matplotlib 
•Customizing plots
•Statistical plots


Machine learning is the science of making computer machine self learn from the past experiences to improve on the task's outcome. There are many exciting real-life examples of the application of machine learning such as robotic vacuum cleaner,
self-driving cars, face or speech recognition, effective web search and many more. In this course, you will learn important machine learning concepts, types of machine learning algorithms, steps in the model building, testing and scoring
and learn various packages and functionalities in Python. You will get introduced to various supervised and unsupervised learning algorithms, you learn more complex linear, nonlinear as well as ensemble machine learning techniques. Enough time is spent on understanding the concept behind each algorithm
within case studies. What’s more? This course will also introduce deep learning.




This is level two program. If you already familiar with basic math concepts like linear algebra, matrices and stats concepts like probability, estimation, this course would be easier for you to grasp. If you are interested to know more on what machine is learning, how to use machine learning, types of machine learning, and implementation of various algorithms to solve some of the business problems then, you should attend this two days course. This course
expects some Python knowledge. 


You must bring your own laptop, 64 bit, at least i3 core processor (i5 preferred) with NVIDIA GPU. Instructions on installation of anaconda for Python would be
sent you before the workshop. You need to follow those instructions to make your laptop ready before the class.




Introduction to Machine Learning
•Intro to machine learning
•Types of machine learning algorithms
•Understanding the basics 
• Steps in the model  building, testing and scoring
•Model Evaluation Metrics
Supervised learning algorithms 
•Understanding Linear and non-linear  algorithms
•Understanding Regression and  Classification

•Linear  Algorithms
•Simple linear regression
•Multiple linear regression
•Logistic regression
Non-linear Algorithms 
•k nearest neighbors
•Classification and regression trees
•Naïve byes
•Neural networks

•Ensemble Algorithms 
•Random forest
•Gradient boosting
•Unsupervised learning algorithms 
•Understanding Clustering and Association
*k -means Clustering
*MBA using Word2Vec
•Introduction to Deep Learning


Cloudera Data Analyst Training
Four Day Course

Apache Hadoop Fundamentals

  • The Motivation for Hadoop

  • Hadoop Overview

  • Data Storage: HDFS

  • Distributed Data Processing: YARN, MapReduce, and Spark

  • Data Processing and Analysis: Pig, Hive, and Impala

  • Database Integration: Sqoop

  • Other Hadoop Data Tools

  • Exercise Scenarios

Introduction to Apache Pig

  • What is Pig?

  • Pig’s Features

  • Pig Use Cases

  • Interacting with Pig

Basic Data Analysis with Apache Pig

  • Pig Latin Syntax

  • Loading Data

  • Simple Data Types

  • Field Definitions

  • Data Output

  • Viewing the Schema

  • Filtering and Sorting Data

  • Commonly Used Functions

Processing Complex Data with Apache Pig

  • Storage Formats

  • Complex/Nested Data Types

  • Grouping

  • Built-In Functions for Complex Data

  • Iterating Grouped Data.

Multi-Dataset Operations with Apache Pig

  • Techniques for Combining Datasets

  • Joining Datasets in Pig

  • Set Operations

  • Splitting Datasets

Apache Pig Troubleshooting and Optimization

  • Troubleshooting Pig

  • Logging

  • Using Hadoop’s Web UI

  • Data Sampling and Debugging

  • Performance Overview

  • Understanding the Execution Plan

  • Tips for Improving the Performance of Pig Jobs

Introduction to Apache Hive and Impala

  • What is Hive?

  • What is Impala?

  • Why Use Hive and Impala?

  • Schema and Data Storage

  • Comparing Hive and Impala to Traditional Databases

  • Use Cases

Querying with Apache Hive and Impala

  • Databases and Tables

  • Basic Hive and Impala Query Language Syntax

  • Data Types

  • Using Hue to Execute Queries

  • Using Beeline (Hive’s Shell)

  • Using the Impala Shell

Apache Hive and Impala Data Management

  • Data Storage

  • Creating Databases and Tables

  • Loading Data

  • Altering Databases and Tables

  • Simplifying Queries with Views

  • Storing Query Results

Data Storage and Performance

  • Partitioning Tables

  • Loading Data into Partitioned Tables

  • When to Use Partitioning

  • Choosing a File Format

  • Using Avro and Parquet File Formats

Relational Data Analysis with Apache Hive and Impala

  • Joining Datasets

  • Common Built-In Functions

  • Aggregation and Windowing

Complex Data with Apache Hive and Impala

  • Complex Data with Hive

  • Complex Data with Impala

Analyzing Text with Apache Hive and Impala

  • Using Regular Expressions with Hive and Impala

  • Processing Text Data with SerDes in Hive

  • Sentiment Analysis and n-grams in Hive

Apache Hive Optimization

  • Understanding Query Performance

  • Bucketing

  • Indexing Data

  • Hive on Spark

Apache Impala Optimization

  • How Impala Executes Queries

  • Improving Impala Performance

Extending Apache Hive and Impala

  • Custom SerDes and File Formats in Hive

  • Data Transformation with

  • Custom Scripts in Hive

  • User-Defined Functions

  • Parameterized Queries

Choosing the Best Tool for the Job

  • Comparing Pig, Hive, Impala, and Relational Databases

  • Which to Choose?

bottom of page