Data Engineer

Data Engineer (Click Here)

MSBI Online Training in Hyderabad Glory Technologies is the best MSBI Online/Class Room training institute in Hyderabad. We provide MSBI course certification. Learn How To Use Microsoft’s MSBI Tools: SSIS,SSRS,SSAS

Microsoft SQL Server
i) Microsoft SSIS Overview
- What is DBMS and RDBMS
- Types of normalizations
- Data Models
- Constraints in SQL Server
- Relationships in SQL server

ii) SQL Statements
- DDL
- DML
- DCL
- TCL
- String Datatypes and Functions
- Aggregation Functions
- Joins and Sub Queries
- Views in SQL Server
- Index in SQL Server
- Triggers in SQL
- User Define Functions
- Stored Procedures

Microsoft SQL Server 2005 Integration Services
i) Microsoft SSIS Overview
- SSIS: SQL Server Integration Services
- DTS & its Limitation
- SSIS Overview
- Architecture
- Flow Types
- Tools
- Utilities
- Typical Users

ii) Getting familiar to Business Studio
Development Environment
- Launch Business Integration Development Studio (BIDS)
- User Package Designer
- User tool Box
- Use Solution Explorer
- Use Properties Window
- Use Variable Window

iii) Creating Package
- Using the SSIS Import and Export Wizard
- Using the SSIS Designers

iv) Introduction to Control Flow
- Control Flow Overview
- Precedence Constraints
- Execute SQL Task
- Bulk Insert Task
- File System Task
- FTP Task
- Send Email Task

v) Introduction to Data flow Data flow Overview
- The Data Source
- The Data Destinations
- The Data Transformations
- The Copy columns Transformation
- The Derived Columns Transformation
- The Data Conversion Transformation
- The Conditional Split Transformation
- The Aggregate Transformation
- The Merge Transformation
- The Merge Join Transformations
- The Union All Transformation
- The Look up Transformation
- The SCD Transformation
- The Pivot and Unpivot Transformation

vi) Adding Looping, Using Breakpoints, Checkpoints and Transactions
- Add Looping
- Use Break points
- Use Check Points
- Use Transactions

Vii) Error Handling and Logging
- Event Handlers
- Handling errors in Data
- Configuring Package Logging
- Built in providers

SQL Server Reporting Services (SSRS)
i) Introduction
- Overview
- Reporting Services Sceneries
- Reporting Services Features
- Reporting Services Concepts
- Architecture
- Life Cycle

ii) Reporting Services Tools
- Report Services Configuration Tool
- Reporting Manager
- Reporting Server
- Report Designer and Model Designer

iii) Reports
- Creation of reports
- Table and matrix report, Tablix Reports
- Filters in Reports
- Creating cascading reports, sub reports, drill down and drill through reports, look up and Page Breaks
- Adding Row and Column Grouping
- SSRS Parameters: Report Parameters, Drop down list parameter, Multi Value and Multiple parameters and Cascading Parameters
- Charts Reports like Bar chart, Area chart, Data bars, Funnel chart, Line chart, Pie chart, stacked Bar chart, Scatter chart etc

iv)Report server Administration
- Snap shot Reports
- Cache Reports
- Managing Reports, Security and how to use Web-based Report Manager.
- How to Create a Standard and Data-driven Subscription and managing subscriptions.
- Deploying reports to the server

v) Publishing Reports

SQL Server Analysis Services (SSAS)
i ) Introduction to Data Ware SQL Server Analysis Services 2008R2
- DW Concepts
- OLTP AND OLAP
- SQL Server Analysis Services 2008R2(SSAS)

ii) First Look Analysis Services 2008R2
- Business Intelligence Studio
- Creating a Project using BIDS
- SQL Server Management Studio

iii) Introduction to MDX
- MDX Fundamentals
- MDX Expressions
- MDX Operations
- MDX Functions
- Manipulating data with MDX

iv) Working with Data Source and Data Source View
- Data Source
- Data Source View
- Working with DSV
- Adding / Removing Table from a DSV
- Create / Delete Relationship
- Adding Named Queries
- Multiple Data Sources within a DSV

v) Dimension Design
- Dimension and Dimension Properties
- Dimension Creation using Dimension Wizards
- Creation a time Dimension
- Translations in Dimensions
- Defined Linked Dimensions
- Define Write Enable Dimensions

vi) Cube design
- Unified Dimension Model
- Cubes
- Cubes Creation
- Cubes Structures
- Cubes Relationship
- Calculations
- Perspectives
- Translation

Azure Data Engineer Training Content :
1.Azure Data Factory

2.Azure Data Bricks

3.Pyspark

4.Python

5.SparkSQL

Duration: 60 Days

Module 1: Introduction to Azure Data Factory

Overview of ADF: What is Azure Data Factory? Features and Use Cases.
Components of ADF: Pipelines, Activities, Datasets, Linked Services, and Triggers.
ADF Architecture: Integration Runtime, Data Flows, and Monitoring.
Hands-On: Create your first ADF pipeline with a Copy Data Activity.

Module 2: Linked Services and Datasets

Understanding Linked Services: Connecting to Azure Storage, SQL Databases, and more.
Creating Datasets: Defining input and output data structures.
Parameterization: Using parameters to make pipelines reusable.
Hands-On: Connect to ADLS and SQL Database using Linked Services.

Module 3: Pipelines and Activities

Understanding Pipelines: Building end-to-end workflows.
Types of Activities: Copy, Lookup, ForEach, Filter, Web, and Stored Procedure Activities.
Data Movement Activities: Copying data from source to destination.
Hands-On: Create a pipeline to copy data from ADLS to Azure Synapse.

Module 4: Control Flow and Pipeline Parameters

Control Flow Activities: ForEach, Until, If Condition, Switch.
Using Parameters: Passing values dynamically to pipelines.
Variables and Expressions: Storing and manipulating values within pipelines.
Hands-On: Build a pipeline with a ForEach loop to process multiple files.

Module 5: Mapping Data Flows

Introduction to Data Flows: What are Mapping Data Flows?
Transformations: Aggregate, Join, Union, Lookup, Pivot, and Derived Column.
Data Flow Optimizations: Partitioning and Performance Tuning.
Hands-On: Create a Data Flow to clean and transform customer data.

Module 6: Triggers and Scheduling Pipelines

Types of Triggers: Manual, Scheduled, Tumbling Window, and Event-Based Triggers.
Trigger Dependencies: Setting up pipeline dependencies.
Monitoring and Alerts: Using ADF Monitoring Dashboard.
Hands-On: Schedule a pipeline to run every day at midnight.

Module 7: Integration with Other Azure Services

Integrating with Azure Databricks: Running notebooks from ADF.
Loading Data into Azure Synapse: Using Copy and Stored Procedure Activities.
Integration with Azure Key Vault: Securely managing credentials.
Hands-On: Build a pipeline to run a Databricks notebook and save output to Synapse.

Module 8: Error Handling and Monitoring

Error Handling Mechanisms: Using Try-Catch with Fail, Continue, or Retry policies.
Monitoring Pipelines: Using ADF Monitoring Hub and Azure Monitor.
Setting Alerts: Configuring alerts for pipeline failures.
Hands-On: Add error-handling logic to a pipeline and test it.

Module 9: CI/CD for ADF Pipelines

Introduction to CI/CD in ADF: What is CI/CD?
Integrating with Azure DevOps: Connecting ADF to a Git repository.
Building and Deploying Pipelines: Using DevOps release pipelines.
Hands-On: Implement a CI/CD pipeline for deploying ADF from Dev to Prod.

Module 10: Real-World Project – End-to-End ETL Pipeline

Project Overview: Build a Data Integration Pipeline for Sales Analytics.
Step 1: Ingest Raw Data from SFTP to ADLS (Bronze Layer).
Step 2: Transform Data using Azure Databricks (Silver Layer).
Step 3: Load Aggregated Data into Azure Synapse (Gold Layer).
Step 4: Schedule Pipelines and Monitor Performance.
Step 5: Implement CI/CD for the pipeline using Azure DevOps.
Final Outcome: A fully automated and production-ready ETL pipeline.

Databricks with PySpark:

Module 1: Introduction to Databricks and PySpark

Overview of Databricks Platform: What is Databricks? Features and Advantages.
Introduction to PySpark: Basics of Spark, Spark Architecture, and Components.
Databricks Workspace: Notebooks, Clusters, Jobs, and Workflows.
Hands-On: Creating a Databricks Notebook and running basic PySpark commands.

Module 2: PySpark DataFrames and Transformations

Creating DataFrames: From CSV, Parquet, JSON, and databases.
DataFrame Operations: select(), filter(), groupBy(), agg(), withColumn(), join()
Data Types and Schemas: Defining and Managing Schemas.
Hands-On: Reading a CSV file from ADLS and performing transformations.

Module 3: Working with Azure Data Lake Storage (ADLS)

Connecting Databricks to ADLS: Using dbutils and Mounting ADLS.
Reading and Writing Data: Parquet, Delta, and JSON formats.
Managing Partitions: Optimizing data storage and queries.
Hands-On: Load data from ADLS, transform it, and save it as Delta tables.

Module 4: PySpark SQL and Querying Data

Introduction to Spark SQL: Writing SQL queries in Databricks.
Creating Temp Views: Using createOrReplaceTempView() for querying.
Joins and Aggregations: Inner, Outer, Left, and Right Joins.
Hands-On: Create a SQL view from a DataFrame and run analytical queries.

Module 5: Managing Incremental Loads

Understanding Incremental Loads: Full vs. Incremental data loads.
Using Delta Tables for Incremental Updates: merge, update, delete.
Handling Late Arriving Data: Using upsert operations.
Hands-On: Build an incremental data pipeline using PySpark and Delta Lake.

Module 6: Optimizing PySpark Jobs in Databricks

Performance Tuning Techniques: Partitioning, Caching, and Broadcast Joins.
Understanding Job Execution Plans: Spark UI and DAG Visualization.
Handling Large Datasets: Bucketing and Shuffling.
Hands-On: Optimize a slow PySpark job and compare execution times.

Module 7: Scheduling and Automation in Databricks

Creating and Scheduling Jobs: Using Databricks Jobs and Task Dependencies.
Integrating with Azure Data Factory (ADF): Automating pipeline execution.
Alerting and Monitoring: Setting up alerts for job failures.
Hands-On: Schedule a daily job and monitor its execution.

Module 8: Real-World Project – End-to-End Data Pipeline

Project Overview: Build a Sales Analytics Dashboard.
Step 1: Load Raw Data from ADLS into Bronze Layer.
Step 2: Clean and Transform Data in Silver Layer.
Step 3: Aggregate and Store Data in Gold Layer.
Step 4: Visualize Results in Power BI.
Step 5: Schedule the pipeline and monitor performance.
Final Outcome: A fully automated and optimized data pipeline.

About Data Engineer

A Data Engineer is a professional responsible for designing, building, and maintaining systems that collect, store, and process large volumes of data. Their primary goal is to ensure that data is accessible, reliable, and ready for analysis by data scientists, analysts, and business teams.
Topics Covered:

Azure Data Factory
Azure Databricks

Python

Pyspark
SparkSQL

Key Responsibilities of a Data Engineer:

Data Pipeline Development: Build and maintain ETL (Extract, Transform, Load) pipelines to move data from various sources to data warehouses or data lakes.
Data Integration: Combine data from different sources (e.g., databases, APIs, cloud storage) into a unified format.
Data Storage: Design and manage databases, data lakes, and data warehouses to store structured and unstructured data.
Data Transformation: Clean, normalize, and enrich raw data to make it suitable for analysis.
Performance Optimization: Improve query performance and ensure systems handle large-scale data efficiently.
Collaboration: Work with data scientists, analysts, and business teams to understand data needs and deliver solutions.
Data Governance: Implement security, privacy, and compliance measures to protect sensitive data.
Automation: Develop scripts and workflows for data ingestion, transformation, and reporting.

Common Tools and Technologies Used by Data Engineers:

Programming Languages: Python, SQL, Scala, Java
Big Data Technologies: Apache Spark, Hadoop
Data Storage: Azure Data Lake, Amazon S3, Google BigQuery, Snowflake
Databases: SQL Server, PostgreSQL, MySQL, Cassandra
ETL Tools: Azure Data Factory, Apache NiFi, Talend
Workflow Orchestration: Apache Airflow, Luigi
Cloud Platforms: Azure, AWS, GCP
CI/CD: Azure DevOps, Jenkins
Reporting Tools: Power BI, Tableau

Difference Between a Data Engineer and a Data Scientist:

Aspect	Data Engineer	Data Scientist
Focus	Building pipelines, managing infrastructure	Analyzing data, building models
Skills	Programming, ETL, database management	Statistics, machine learning, data visualization
Tools	Spark, Hadoop, Azure Data Factory	Jupyter, TensorFlow, Scikit-learn
Outcome	Reliable, accessible data for analysis	Insights, predictions, and business strategies

Data Engineer (Click Here)

Azure Data Engineer Training Content :
1.Azure Data Factory

2.Azure Data Bricks

3.Pyspark

4.Python

5.SparkSQL

Duration: 60 Days

Module 1: Introduction to Azure Data Factory

Module 2: Linked Services and Datasets

Module 3: Pipelines and Activities

Module 4: Control Flow and Pipeline Parameters

Module 5: Mapping Data Flows

Module 6: Triggers and Scheduling Pipelines

Module 7: Integration with Other Azure Services

Module 8: Error Handling and Monitoring

Module 9: CI/CD for ADF Pipelines

Module 10: Real-World Project – End-to-End ETL Pipeline

Databricks with PySpark:

Module 1: Introduction to Databricks and PySpark

Module 2: PySpark DataFrames and Transformations

Module 3: Working with Azure Data Lake Storage (ADLS)

Module 4: PySpark SQL and Querying Data

Module 5: Managing Incremental Loads

Module 6: Optimizing PySpark Jobs in Databricks

Module 7: Scheduling and Automation in Databricks

Module 8: Real-World Project – End-to-End Data Pipeline

Courses

Enquire Now

Data Engineer (Click Here)

Azure Data Engineer Training Content : 1.Azure Data Factory

2.Azure Data Bricks

3.Pyspark

4.Python

5.SparkSQL

Duration: 60 Days

Module 1: Introduction to Azure Data Factory

Module 2: Linked Services and Datasets

Module 3: Pipelines and Activities

Module 4: Control Flow and Pipeline Parameters

Module 5: Mapping Data Flows

Module 6: Triggers and Scheduling Pipelines

Module 7: Integration with Other Azure Services

Module 8: Error Handling and Monitoring

Module 9: CI/CD for ADF Pipelines

Module 10: Real-World Project – End-to-End ETL Pipeline

Databricks with PySpark:

Module 1: Introduction to Databricks and PySpark

Module 2: PySpark DataFrames and Transformations

Module 3: Working with Azure Data Lake Storage (ADLS)

Module 4: PySpark SQL and Querying Data

Module 5: Managing Incremental Loads

Module 6: Optimizing PySpark Jobs in Databricks

Module 7: Scheduling and Automation in Databricks

Module 8: Real-World Project – End-to-End Data Pipeline

Courses

Enquire Now

Azure Data Engineer Training Content :
1.Azure Data Factory