MSBI Online Training in Hyderabad
Glory Technologies is the best MSBI Online/Class Room training institute in Hyderabad. We provide MSBI course certification. Learn How To Use Microsoft’s MSBI Tools: SSIS,SSRS,SSAS
Microsoft SQL Server
i) Microsoft SSIS Overview
- What is DBMS and RDBMS
- Types of normalizations
- Data Models
- Constraints in SQL Server
- Relationships in SQL server
ii) SQL Statements
- DDL
- DML
- DCL
- TCL
- String Datatypes and Functions
- Aggregation Functions
- Joins and Sub Queries
- Views in SQL Server
- Index in SQL Server
- Triggers in SQL
- User Define Functions
- Stored Procedures
Microsoft SQL Server 2005 Integration Services
i) Microsoft SSIS Overview
- SSIS: SQL Server Integration Services
- DTS & its Limitation
- SSIS Overview
- Architecture
- Flow Types
- Tools
- Utilities
- Typical Users
ii) Getting familiar to Business Studio
Development Environment
- Launch Business Integration Development Studio (BIDS)
- User Package Designer
- User tool Box
- Use Solution Explorer
- Use Properties Window
- Use Variable Window
iii) Creating Package
- Using the SSIS Import and Export Wizard
- Using the SSIS Designers
iv) Introduction to Control Flow
- Control Flow Overview
- Precedence Constraints
- Execute SQL Task
- Bulk Insert Task
- File System Task
- FTP Task
- Send Email Task
v) Introduction to Data flow Data flow Overview
- The Data Source
- The Data Destinations
- The Data Transformations
- The Copy columns Transformation
- The Derived Columns Transformation
- The Data Conversion Transformation
- The Conditional Split Transformation
- The Aggregate Transformation
- The Merge Transformation
- The Merge Join Transformations
- The Union All Transformation
- The Look up Transformation
- The SCD Transformation
- The Pivot and Unpivot Transformation
vi) Adding Looping, Using Breakpoints, Checkpoints and Transactions
- Add Looping
- Use Break points
- Use Check Points
- Use Transactions
Vii) Error Handling and Logging
- Event Handlers
- Handling errors in Data
- Configuring Package Logging
- Built in providers
SQL Server Reporting Services (SSRS)
i) Introduction
- Overview
- Reporting Services Sceneries
- Reporting Services Features
- Reporting Services Concepts
- Architecture
- Life Cycle
ii) Reporting Services Tools
- Report Services Configuration Tool
- Reporting Manager
- Reporting Server
- Report Designer and Model Designer
iii) Reports
- Creation of reports
- Table and matrix report, Tablix Reports
- Filters in Reports
- Creating cascading reports, sub reports, drill down and drill through reports, look up and Page Breaks
- Adding Row and Column Grouping
- SSRS Parameters: Report Parameters, Drop down list parameter, Multi Value and Multiple parameters and Cascading Parameters
- Charts Reports like Bar chart, Area chart, Data bars, Funnel chart, Line chart, Pie chart, stacked Bar chart, Scatter chart etc
iv)Report server Administration
- Snap shot Reports
- Cache Reports
- Managing Reports, Security and how to use Web-based Report Manager.
- How to Create a Standard and Data-driven Subscription and managing subscriptions.
- Deploying reports to the server
v) Publishing Reports
SQL Server Analysis Services (SSAS)
i ) Introduction to Data Ware SQL Server Analysis Services 2008R2
- DW Concepts
- OLTP AND OLAP
- SQL Server Analysis Services 2008R2(SSAS)
ii) First Look Analysis Services 2008R2
- Business Intelligence Studio
- Creating a Project using BIDS
- SQL Server Management Studio
iii) Introduction to MDX
- MDX Fundamentals
- MDX Expressions
- MDX Operations
- MDX Functions
- Manipulating data with MDX
iv) Working with Data Source and Data Source View
- Data Source
- Data Source View
- Working with DSV
- Adding / Removing Table from a DSV
- Create / Delete Relationship
- Adding Named Queries
- Multiple Data Sources within a DSV
v) Dimension Design
- Dimension and Dimension Properties
- Dimension Creation using Dimension Wizards
- Creation a time Dimension
- Translations in Dimensions
- Defined Linked Dimensions
- Define Write Enable Dimensions
vi) Cube design
- Unified Dimension Model
- Cubes
- Cubes Creation
- Cubes Structures
- Cubes Relationship
- Calculations
- Perspectives
- Translation
Azure Data Engineer Training Content :
1.Azure Data Factory
2.Azure Data Bricks
3.Pyspark
4.Python
5.SparkSQL
Duration: 60 Days
Module 1: Introduction to Azure Data Factory
- Overview of ADF: What is Azure Data Factory? Features and Use Cases.
- Components of ADF: Pipelines, Activities, Datasets, Linked Services, and Triggers.
- ADF Architecture: Integration Runtime, Data Flows, and Monitoring.
- Hands-On: Create your first ADF pipeline with a Copy Data Activity.
Module 2: Linked Services and Datasets
- Understanding Linked Services: Connecting to Azure Storage, SQL Databases, and more.
- Creating Datasets: Defining input and output data structures.
- Parameterization: Using parameters to make pipelines reusable.
- Hands-On: Connect to ADLS and SQL Database using Linked Services.
Module 3: Pipelines and Activities
- Understanding Pipelines: Building end-to-end workflows.
- Types of Activities: Copy, Lookup, ForEach, Filter, Web, and Stored Procedure Activities.
- Data Movement Activities: Copying data from source to destination.
- Hands-On: Create a pipeline to copy data from ADLS to Azure Synapse.
Module 4: Control Flow and Pipeline Parameters
- Control Flow Activities: ForEach, Until, If Condition, Switch.
- Using Parameters: Passing values dynamically to pipelines.
- Variables and Expressions: Storing and manipulating values within pipelines.
- Hands-On: Build a pipeline with a ForEach loop to process multiple files.
Module 5: Mapping Data Flows
- Introduction to Data Flows: What are Mapping Data Flows?
- Transformations: Aggregate, Join, Union, Lookup, Pivot, and Derived Column.
- Data Flow Optimizations: Partitioning and Performance Tuning.
- Hands-On: Create a Data Flow to clean and transform customer data.
Module 6: Triggers and Scheduling Pipelines
- Types of Triggers: Manual, Scheduled, Tumbling Window, and Event-Based Triggers.
- Trigger Dependencies: Setting up pipeline dependencies.
- Monitoring and Alerts: Using ADF Monitoring Dashboard.
- Hands-On: Schedule a pipeline to run every day at midnight.
Module 7: Integration with Other Azure Services
- Integrating with Azure Databricks: Running notebooks from ADF.
- Loading Data into Azure Synapse: Using Copy and Stored Procedure Activities.
- Integration with Azure Key Vault: Securely managing credentials.
- Hands-On: Build a pipeline to run a Databricks notebook and save output to Synapse.
Module 8: Error Handling and Monitoring
- Error Handling Mechanisms: Using Try-Catch with Fail, Continue, or Retry policies.
- Monitoring Pipelines: Using ADF Monitoring Hub and Azure Monitor.
- Setting Alerts: Configuring alerts for pipeline failures.
- Hands-On: Add error-handling logic to a pipeline and test it.
Module 9: CI/CD for ADF Pipelines
- Introduction to CI/CD in ADF: What is CI/CD?
- Integrating with Azure DevOps: Connecting ADF to a Git repository.
- Building and Deploying Pipelines: Using DevOps release pipelines.
- Hands-On: Implement a CI/CD pipeline for deploying ADF from Dev to Prod.
Module 10: Real-World Project – End-to-End ETL Pipeline
- Project Overview: Build a Data Integration Pipeline for Sales Analytics.
- Step 1: Ingest Raw Data from SFTP to ADLS (Bronze Layer).
- Step 2: Transform Data using Azure Databricks (Silver Layer).
- Step 3: Load Aggregated Data into Azure Synapse (Gold Layer).
- Step 4: Schedule Pipelines and Monitor Performance.
- Step 5: Implement CI/CD for the pipeline using Azure DevOps.
- Final Outcome: A fully automated and production-ready ETL pipeline.
Databricks with PySpark:
Module 1: Introduction to Databricks and PySpark
- Overview of Databricks Platform: What is Databricks? Features and Advantages.
- Introduction to PySpark: Basics of Spark, Spark Architecture, and Components.
- Databricks Workspace: Notebooks, Clusters, Jobs, and Workflows.
- Hands-On: Creating a Databricks Notebook and running basic PySpark commands.
Module 2: PySpark DataFrames and Transformations
- Creating DataFrames: From CSV, Parquet, JSON, and databases.
- DataFrame Operations: select(), filter(), groupBy(), agg(), withColumn(), join()
- Data Types and Schemas: Defining and Managing Schemas.
- Hands-On: Reading a CSV file from ADLS and performing transformations.
Module 3: Working with Azure Data Lake Storage (ADLS)
- Connecting Databricks to ADLS: Using dbutils and Mounting ADLS.
- Reading and Writing Data: Parquet, Delta, and JSON formats.
- Managing Partitions: Optimizing data storage and queries.
- Hands-On: Load data from ADLS, transform it, and save it as Delta tables.
Module 4: PySpark SQL and Querying Data
- Introduction to Spark SQL: Writing SQL queries in Databricks.
- Creating Temp Views: Using createOrReplaceTempView() for querying.
- Joins and Aggregations: Inner, Outer, Left, and Right Joins.
- Hands-On: Create a SQL view from a DataFrame and run analytical queries.
Module 5: Managing Incremental Loads
- Understanding Incremental Loads: Full vs. Incremental data loads.
- Using Delta Tables for Incremental Updates: merge, update, delete.
- Handling Late Arriving Data: Using upsert operations.
- Hands-On: Build an incremental data pipeline using PySpark and Delta Lake.
Module 6: Optimizing PySpark Jobs in Databricks
- Performance Tuning Techniques: Partitioning, Caching, and Broadcast Joins.
- Understanding Job Execution Plans: Spark UI and DAG Visualization.
- Handling Large Datasets: Bucketing and Shuffling.
- Hands-On: Optimize a slow PySpark job and compare execution times.
Module 7: Scheduling and Automation in Databricks
- Creating and Scheduling Jobs: Using Databricks Jobs and Task Dependencies.
- Integrating with Azure Data Factory (ADF): Automating pipeline execution.
- Alerting and Monitoring: Setting up alerts for job failures.
- Hands-On: Schedule a daily job and monitor its execution.
Module 8: Real-World Project – End-to-End Data Pipeline
- Project Overview: Build a Sales Analytics Dashboard.
- Step 1: Load Raw Data from ADLS into Bronze Layer.
- Step 2: Clean and Transform Data in Silver Layer.
- Step 3: Aggregate and Store Data in Gold Layer.
- Step 4: Visualize Results in Power BI.
- Step 5: Schedule the pipeline and monitor performance.
- Final Outcome: A fully automated and optimized data pipeline.
About Data Engineer
A Data Engineer is a professional responsible for designing, building, and maintaining systems that collect, store, and process large volumes of data. Their primary goal is to ensure that data is accessible, reliable, and ready for analysis by data scientists, analysts, and business teams.
Topics Covered:
Azure Data Factory
Azure Databricks
Python
Pyspark
SparkSQL
Key Responsibilities of a Data Engineer:
- Data Pipeline Development: Build and maintain ETL (Extract, Transform, Load) pipelines to move data from various sources to data warehouses or data lakes.
- Data Integration: Combine data from different sources (e.g., databases, APIs, cloud storage) into a unified format.
- Data Storage: Design and manage databases, data lakes, and data warehouses to store structured and unstructured data.
- Data Transformation: Clean, normalize, and enrich raw data to make it suitable for analysis.
- Performance Optimization: Improve query performance and ensure systems handle large-scale data efficiently.
- Collaboration: Work with data scientists, analysts, and business teams to understand data needs and deliver solutions.
- Data Governance: Implement security, privacy, and compliance measures to protect sensitive data.
- Automation: Develop scripts and workflows for data ingestion, transformation, and reporting.
Common Tools and Technologies Used by Data Engineers:
- Programming Languages: Python, SQL, Scala, Java
- Big Data Technologies: Apache Spark, Hadoop
- Data Storage: Azure Data Lake, Amazon S3, Google BigQuery, Snowflake
- Databases: SQL Server, PostgreSQL, MySQL, Cassandra
- ETL Tools: Azure Data Factory, Apache NiFi, Talend
- Workflow Orchestration: Apache Airflow, Luigi
- Cloud Platforms: Azure, AWS, GCP
- CI/CD: Azure DevOps, Jenkins
- Reporting Tools: Power BI, Tableau
Difference Between a Data Engineer and a Data Scientist:
Aspect | Data Engineer | Data Scientist |
Focus | Building pipelines, managing infrastructure | Analyzing data, building models |
Skills | Programming, ETL, database management | Statistics, machine learning, data visualization |
Tools | Spark, Hadoop, Azure Data Factory | Jupyter, TensorFlow, Scikit-learn |
Outcome | Reliable, accessible data for analysis | Insights, predictions, and business strategies |