Data science is a field that uses scientific methods and different processes to extract the data insights from both unstructured and structured data. Our experienced trainers will train recent graduates on how to use R language and Python to create algorithms and effective machine learning models. Students will be able to gain a foundation on how to build and run data pipeline. With this knowledge, students can apply it to uncover solutions hidden in the data to take on business challenges and goals.
Data Science, in general, is a field that often uses scientific methods and different processes to exact the data insights from both unstructured and structured data, and it is often similar to data mining. When it comes to the Data science with R Language and Python, you will use the two professional languages to create algorithms and effective machine learning models, different scientific process and so on. You will learn how to build and run data pipeline and other recommended settings in this Data Science with R language and Python.
You will learn about how to understand analytics tools by exploring R programming language. Along with that, you will be learning about installing R on different operating systems. You will even learn how to advance future in R Data Visualization, Variable identification and more in this Data Science with R language. With python, you will learn how to do the types of regression analysis, interaction regression, hypothesis testing, and other business factors as well.
Duration: 80 hours
Cost: $750/course
(excluding any certification cost)
Curriculum
Data Science Overview
- Data Science
- Data Scientists
- Examples of Data Science in day to day life
- Python for Data Science
Data Analytics Overview
- Introduction to Data Visualization
- Processes in Data Science
- Data Wrangling, Data Exploration, and Model Selection
- Exploratory Data Analysis or EDA
- Data Visualization
- Plotting
- Hypothesis Building and Testing
Statistical Analysis and Business Applications
- Introduction to Statistics
- Statistical and Non-Statistical Analysis
- Some Common Terms Used in Statistics
- Data Distribution
- Methods of Central Tendency
- Mean, Median, Mode
- Methods of Dispersion
- Percentiles, Dispersion
- Histogram
- Bell Curve
- Hypothesis Testing
- Chi-Square Test
- Correlation Matrix
- Inferential Statistics
Python: Environment Setup and Essentials
- Introduction to Anaconda
- Installation of Anaconda Python Distribution – For Windows, Mac OS, and Linux
- Jupyter Notebook Installation
- Jupyter Notebook Introduction
- Variable Assignment
- Basic Data Types: Integer, Float, String, None, and Boolean; Typecasting
- Creating, accessing, and slicing tuples
- Creating, accessing, and slicing lists
- Creating, viewing, accessing, and modifying dicts
- Creating and using operations on sets
- Basic Operators: ‘in’, ‘+’, ‘*’
- Logical operators
- Functions
- Use of break and continue keywords
- Control Flow
- Classes
- Objects
- Object oriented programming in python (encapsulation, abstraction, inheritance & polymorphism)
Mathematical Computing with Python (NumPy)
- NumPy Overview
- Properties, Purpose, and Types of ndarray
- Class and Attributes of ndarray Object
- Basic Operations: Concept and Examples
- Accessing Array Elements: Indexing, Slicing, Iteration, Indexing with Boolean Arrays
- Copy and Views
- Universal Functions (ufunc)
- Shape Manipulation
- Broadcasting
- Linear Algebra
Scientific computing with Python (Scipy)
- SciPy and its Characteristics
- SciPy sub-packages
- SciPy sub-packages –Integration
- SciPy sub-packages – Optimize
- Linear Algebra
- SciPy sub-packages – Statistics
- SciPy sub-packages – Weave
- SciPy sub-packages – I O
Data Manipulation with Python (Pandas)
- Introduction to Pandas
- Data Structures
- Series
- DataFrame
- Missing Values
- Data Operations
- Data Standardization
- Pandas File Read and Write Support
- SQL Operation
Machine Learning with Python (Scikit–Learn)
- Introduction to Machine Learning
- Machine Learning Approach
- How Supervised and Unsupervised Learning Models Work
- Scikit-Learn
- Supervised Learning Models
- Linear Regression
- Supervised Learning Models
- Logistic Regression
- K Nearest Neighbours (K-NN) Model
- Unsupervised Learning Models
- Clustering
- Unsupervised Learning Models
- Dimensionality Reduction
- Pipeline
- Model Persistence
- Model Evaluation – Metric Functions
Natural Language Processing with Scikit-Learn
- NLP Overview
- NLP Approach for Text Data
- NLP Environment Setup
- NLP Sentence analysis
- NLP Applications
- Major NLP Libraries
- Scikit-Learn Approach
- Scikit – Learn Approach Built – in Modules
- Scikit – Learn Approach Feature Extraction
- Bag of Words
- Extraction Considerations
- Scikit – Learn Approach Model Training
- Scikit – Learn Grid Search and Multiple Parameters
- Pipeline
Data Visualization in Python using Matplotlib
- Introduction to Data Visualization
- Python Libraries
- Plots
- Matplotlib Features:
- Line Properties Plot with (x, y)
- Controlling Line Patterns and Colors
- Set Axis, Labels, and Legend Properties
- Alpha and Annotation
- Multiple Plots
- Subplots
- Types of Plots and Seaborn
Data Science with Python Web Scraping
- Web Scraping
- Common Data/Page Formats on The Web
- The Parser
- Importance of Objects
- Understanding the Tree
- Searching the Tree
- Navigating options
- Modifying the Tree
- Parsing Only Part of the Document
- Printing and Formatting
- Encoding
Python integration with Hadoop, MapReduce and Spark
- Need for Integrating Python with Hadoop
- Big Data Hadoop Architecture
- MapReduce
- Apache Spark
- Resilient Distributed Systems (RDD)
- PySpark
- Spark Tools
- PySpark Integration with Jupyter Notebook