This course is applicable for software version 10.2.2. Learn to accelerate Big Data Integration through mass ingestion, incremental loads, transformations, processing of complex files, and integrating data science using Python. Optimize the Big Data system performance through monitoring, troubleshooting, and best practices while gaining an understanding of how to reuse application logic for big data use cases.
Module 1: Big Data Integration Course Introduction
- Course Agenda
- Accessing the lab environment
- Related Courses
Module 2: Big Data Basics
- What is Big Data?
- Hadoop concepts
- Hadoop Architecture Components
- The Hadoop Distributed File System (HDFS)
- Purposes of a Name Node & Secondary Name Node
- “Yet Another Resource Manager” (YARN) (MapReduce Version 2)
Module 3: Data Warehouse Offloading
- Challenges with traditional Data Warehousing
- The requirements of optimal Data Warehouse
- The Data Warehouse Offloading Process
Module 4: Ingestion and Offload
- PowerCenter Reuse Reports
- Importin PowerCenter Mappings to Developer
- SQL to Mapping capability
- Partitioning and parallelism
Module 5: Big Data Management Architecture
- The Big Data world
- Build once, deploy anywhere
- The Informatica abstraction layer
- Polyglot computing
- The Smart Executor
- Open source and innovation
- Connection architecture
- Conections to third Party applications
Module 6: Informatica Polyglot Computing in Hadoop
- Hive MR/Tez
- The Smart Executor
Module 7: Mappings, Monitoring, and Troubleshooting
- Configuring and running a mapping in Native and Hadoop environments
- Execution Plans
- Monitor mappings
- Troubleshoot mappings
- Viewing mapping results
Module 8: Hadoop Data Integration Challenges and Performance Tuning
- Describe challenges with executing mappings in Hadoop
- Big Data Management Performance Tuning
- Hive Environment Optimization
Module 9: Data Quality on Hadoop
- The Data Quality process
- Discover insights into your data
- Collaborate and Create Data Improvement Assets
- Modify, Manage, and Monitor Data Quality
- Self Service Data Quality
- Executing Data Quality mappings on Hadoop
Module 10: Complex File Parsing
- The Complex file reader
- The Data Processor transformation
- The Complex file writer
- Performance Considerations: Partitioning
- Parsing and processing Avro, Parquet, JSON, and XML file
- Data Processor Transformation Considerations
Module 11: Accessing NoSQL Databases
- CAP Theorem
Completion of Developer Tool for Big Data Developers training
- Define 'Big Data'
- Identify and prioritise the offloading resource intensive Data Warehouse processes to Hadoop
- Migrate PowerCenter mappings to Big Data Management and ingest data into Hadoop
- Migrate and ingest data into Hadoop using SQOOP and SQL Mapping
- Describe the Informatica on Hadoop architecture
- Transform data on Hadoop using Informatica polyglot computing
- Differentiate the capabilities of the Informatica engines on Hadoop including Hive, MR/Tez, Blaze, and Spark engines
- Leverage the Informatica Smart Executor
- Utilise Informatica and Hadoop monitoring and troubleshooting
- Parse and transform complex data such as JSON, AVRO, and Parquet
- Describe how Informatica parses, reads, and writes NoSQL data collections
What do I need to bring with me to my public class?
All required learning materials and equipment are provided in the classroom.
When do public training course fees have to be paid?
For public training classes payment must be received no later than three business days prior to the first day of class in order to remain in the class and confirm your seat. Failure to provide payment by this date may result in removal from the class, and/or late cancellation fees applied. You can submit payment in the form of a Purchase Order or credit card.
On-site (private) Course Pricing:
To find out more about On-site training e-mail us at email@example.com or call one of our offices.
What is the cancellation policy?
Requests for cancellations or date transfers need to be received at least ten (10) business days prior to the event start date in order to receive a full refund. If a cancellation or reschedule request is received less than ten (10) business days before the start date, the penalty of 100% of the cost of the course will be applied, resulting in no amount of the fee being refunded. Refunds will not be allowed for “no-shows” in our public training or IVA courses. This cancellation policy is strictly enforced.
What happens if Agile Solutions needs to cancel or reschedule a course?
Agile Solutions reserves the right to cancel events for any reason at any time. Cancellation liability for Agile Solutions, if Agile Solutions cancels the course, is limited to the return of course payment ONLY. Agile Solutions will not reimburse registrants for any other costs including but not limited to any travel cancellation fees or penalties, including airfare and hotel costs. PLEASE NOTE: If your registration status is either “Approved”, or “Pending Payment” you have not been confirmed for the class and it is recommended that you do not make any travel arrangements until you have received a confirmation e-mail letting you know the class and registration is confirmed.
How will I know if my course has been rescheduled?
Agile Solutions reserves the right to reschedule or cancel a course due to low enrollment or if necessitated by other circumstances. Agile Solutions will contact you via e-mail or phone to inform you of the change of schedule. Once you have been notified you may reschedule or receive a full credit. Agile Solutions shall not be liable for any other costs including but not limited to any non-refundable travel arrangements if a course is rescheduled or cancelled.