Offered By
Logo 1 Indian Statistical Institute
Logo 2 IDEAS

IDEAS, Indian Statistical Institute Certificate Programme

Data Engineering: The Foundation of AI

A human looking metaphorical representation of futuristic AI
₹ 29,500/- (Inclusive of all applicable taxes)

About the programme

about

This online certificate programme has been created by IDEAS - Technology Innovation Hub (TIH) of ISI Kolkata in collaboration with TCS iON. IDEAS - Technology Innovation Hub was established by the Indian Statistical Institute, Kolkata, under the aegis of the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), Department of Science and Technology (DST), Government of India.

Data Engineering provides the foundation of AI where the raw potential of data is harnessed, transformed, and deployed to fuel intelligent systems. Data Engineers are vital members of any enterprise AI/Data Analytics team, responsible for managing, optimising, overseeing, and monitoring data retrieval, storage, processing, and distribution throughout the organisation.

The course has been meticulously designed for both engineering students and working professionals, with more emphasis on hands-on experience and practical exposure to cutting edge tools and technologies. Learners will gain a comprehensive understanding about the essential building blocks and working principles of Data Engineering technologies with the help of industry and academic experts through live lectures and various hands-on tools used by the industry.

Read More...
about

Key highlights

80 hours of overall experiential learning

Live lectures from TCS Experts and ISI Kolkata faculty

Opportunity to appear for 3 corporate interviews*
*Eligibility based on clearance of TCS iON NPT

Industry projects and use cases

Live doubt clearing sessions by the experts

The participants are expected to have developed the following skills by the end of the programme:

  • Master the steps of Data Engineering Processes and Data Management Functions with business cases
  • Be proficient in the cutting-edge data storage technologies and its usage
  • Be industry-ready with hands-on knowledge on DataOps, Containerization, and Orchestration of Distributed Data Management Processes
  • Be proficient in Data Ingestion Technologies, Managing Data Pipelines, Quality, and Security
Read More
  • Learn fundamentals of ML automation processes with MLOps platforms and pipelines
  • Achieve hands-on proficiency in diverse technologies involving Spark, Python-centric Libraries, Cloud Services and Platforms, Big Data Technologies, Visualisation and Reporting using Tableau/Power BI and much more
Read Less

Learning Outcomes

What you learn

Develop Hands-on expertise in:

Python and SQL (through refresher sessions)

Kafka, Flume, Kinesis

MinIO (open source S3)

Spark (via PySpark)

Hadoop, Hive, Presto, Iceberg

Apache Airflow

Docker

Kubernetes

Databricks

Cloud (AWS Glue)

TensorFlow, Spark MLlib

MLflow

Power BI, Apache Superset

The field generates a variety of job roles, including but not limited to:

The field generates a variety of job roles, including but not limited to:

Data Engineer

Data Architect

MLOps Engineer

BI Engineer

Big Data Analyst

DataOps Engineer

AI Engineer

Containerization Engineer

Programme pedagogy

Expert-led live sessions

Engage in dynamic training sessions conducted by distinguished faculty from ISI Kolkata along with seasoned industry professionals.

Recorded session videos

Access educational content on-demand, available for learning on any device, at any time, ensuring flexibility and convenience for revisiting the material.

Industry use cases and simulations

Enhance your understanding of complex business situations by participating in simulations and case studies that mirror complex business environments, providing deep, experiential learning.

Peer networking and expert connect

Expand your professional network and engage with experts through our interactive community platforms, facilitating enhanced learning and problem-solving.

Hands-on learning experiences

Participate in practical sessions where you will work with real-world data and public datasets, fostering a deep, experiential understanding of the subject.

Live doubt solving sessions

Address your queries in real-time with direct access to experts during live sessions, ensuring clarity and immediate assistance.

Dedicated learning management team

Receive continual guidance and support from our committed learning management team, tailored to meet your educational needs and enhance your learning journey.

Programme Syllabus Overview

Data Engineering: ISI
This comprehensive syllabus covering the latest industry practices and techniques on Data Engineering is structured into eleven modules spanning across 20 weeks with 1 month of optional capstone project. This course is designed to guide the learners through the intricacies of the concepts, the latest tools and technologies as per the industry standards with absolute hands-on implementations on case studies and scenarios. Each module builds on the knowledge from the previous, ensuring a cohesive and thorough understanding from the basic to the advanced topics leveraging extensive hands-on implementations.

  • Overview of Data Engineering: Role, Importance, Implementation
  • Understanding Data Engineering, Data Analysis and Data Science
  • Data Engineering for (i) Data Analytics (ii) Machine Learning
  • Different Types of Data - 5Vs
  • Data Lifecycle Details: Ingestion, Storage, Processing, Analysis, and Visualization
  • Landscape of Tools and Technologies
  • Role/Job Areas of Data Engineering
  • Modular Case Study: 1
  • Formative Assessment: 1
  • Understanding Business Data and its requirements
  • KPIs and Metrics
  • Analysis of Data from different domains like Health Care, Education, Human Resource, Retail Business Chain and FMCG, Media, Hospitality Industry and more
  • Modular Case Study: 2
  • Formative Assessment: 2
  • Python Programming: Programming Essentials, Data Types, Control Structures, Functions, Modules, OO Concepts
  • Data Operations using Pandas: Data Cleaning, Munging/Wrangling, Manipulation, EDA using Pandas, Working with Different Data Sources and Structures
  • Programming using Scala: Programming Fundamentals using Scala 2, Concepts of Parallel Programming using Scala
  • Modular Case Study: 3
  • Formative Assessment: 3
  • Relational Databases/Object Relational Databases and its implementation
  • NoSQL Databases: Features and Characteristics. Types of NoSQL data models like Key Value, Column-oriented, Document, Graph
  • Implementation of NoSQL Databases
  • Data Warehouses: Concepts, Star and Snowflake Schemas, OLAP, and Data Marts. Implementation and Design
  • Distributed File Systems: Hadoop HDFS, Google Cloud Storage, Amazon S3. Implementation of Hadoop Cluster and AWS S3
  • Data Processing using Hive. Implementation of Hive in Hadoop Cluster
  • Working with open Data Lakehouse with Presto and Apache Iceberg
  • Scalable Query Handling using Presto
  • Modular Case Study: 4
  • Formative Assessment: 4
  • Different types of Data Sources and its destinations. Architecture of Data Ingestion Mechanism
  • Types of Ingestion Process: (i) Batch (ii) Streaming
  • Data Ingestion Pipleline
  • Data Ingestion Technologies like Apache Kafka, Flume, Amazon Kinesis and more
  • Dynamic Pipeline Generation, ETL, Developing, Scheduling, and Monitoring Batch-oriented Workflows, DAGs using Apache Airflow
  • Working with No-code ETL/ELT Tools
  • Introduction to the concepts of Data Lakehouse and its implementation
  • Modular Case Study: 5
  • Formative Assessment: 5
  • Understanding DataOps Vs. DevOps
  • Review of Operating System Concepts
  • Basic Principles of Containerization
  • Understanding the concepts of DataOps
  • Applying CI/CD concepts in Data Pipelines
  • Introduction to Docker
  • Dockerfiles, Images, and Containers
  • Docker Networking
  • Docker Compose
  • Orchestration
  • Introduction to Kubernetes
  • Running Data Orchestration Tools on Kubernetes
  • Modular Case Study: 6
  • Formative Assessment: 6
  • Essentials of Apache Spark:Spark Architecture, Spark Resilient Distributed Datasets (RDD) and its operations, Dataframe Basics, Dataframe Transformation and Execution, Dataframe Joining, Implementation using PySpark/ScalaSpark
  • Ingesting Data into Spark: Spark SQL, Spark Data and Stream Processing, Implementation using PySpark/ScalaSpark
  • Working with Databricks
  • Working with Ray - Ray with Databricks, Spark and more
  • No Code/Low Code Big Data Processing Platforms: Introduction to Google Cloud BigQuery, Working with Amazon EMR
  • Modular Case Study: 7
  • Formative Assessment: 7
  • Introduction to Reports for Data Analysis: Descriptive Analysis and its Reports: Key Performance Indicator (KPI) Dashboards and Periodic Reports, Diagnostic Analysis and Detailed Drill Down Reports, Predictive Analysis and Reports based on Predictive Models, Prescriptive Analysis and Reports based on AI/ML Models
  • Implementation of Reports and Dashboards using Apache Superset, Microsoft Power BI, and Salesforce Tableau
  • Modular Case Study: 8
  • Formative Assessment: 8
  • Introduction to Different Cloud Platforms - AWS, Azure and Google Cloud Architecture and Comparative Analysis
  • Working with Amazon RedShift, BigQuery
  • Managed Data Services: AWS Glue, Google Dataflow, Azure Data Factory
  • Modular Case Study: 9
  • Formative Assessment: 9
  • Predictive Modeling: Regression and Classification Algorithms, Supervised and Unsupervised Algorithms, Performance Measures and Metrics
  • Introduction to Deep Learning, NLP and RL: Introduction to Convolutional Neural Networks (CNN), RNN and LSTM, Introduction to Natural Language Processing (NLP) and Toolkit Application in Natural Language Processing (NLP), Introduction to RL terminologies and concepts
  • Introduction to LLMs: Understanding LLM (Large Language Models) and use cases, Prompt Engineering and tuning to enhance LLM performance, Tools to implement LLMs and GenAI
  • Implementation of ML/DL Algorithms using Python centric libraries and Spark ML
  • Modular Case Study: 10
  • Formative Assessment: 10
  • Introduction to MLOps, Working with MLOps Platforms - AWS SageMaker, MLFlow, Pipelines for Model Building, Distributed Model Training, LLM Training Pipelines, Pipelines for Real-time ML Inference, Pipeline for RAG, RLHF Training Pipeline for LLMs using Python centric libraries
  • Modular Case Study: 11
  • Formative Assessment: 11

Capstone Project

This syllabus is designed to not only impart essential knowledge and hands-on implementation on the advanced Data Engineering concepts, but also to enable learners to implement the practical applications through case studies, simulations, and hands-on projects, making participants absolutely industry-ready for the high demand job scenarios.

Programme Structure

The programme covers the latest industry practices and techniques on Data Engineering and is structured into eleven modules spanning across 20 weeks with 1 month of optional Capstone Project.

Digital Certificate

Learners will be awarded a co-branded digital certificate upon successful completion of the programme.

Certificate

Meet the Mentor(s)

FAQs

Please take a look at the most frequently asked questions; you might have your query answered here.

Freshers/individuals pursuing Bachelor's/Master's degree, aspiring for a career in the field of Data Engineering and junior and mid-career professionals looking for an accelerated career growth and salary hike can apply for this programme.
  • Click on the "Buy Now" button.
  • Login with your TCS iON Digital Learning Hub credentials or sign up as a new user.
  • After login/sign up, you will be asked to share your details required to complete your purchase. This includes your name, email ID, phone number and other details.
  • In case of Buy Now: On successful submission of the form, you need to proceed to make the payment by clicking on "Click to Pay/Activate Code".
  • In case of Activate Now: A pop-up window will open. Enter the Licence Code and click on the "Activate" button. You will be asked to share your details required to complete the application form. This includes your name, email ID, phone number and other details. Once the activation code and your details are successfully submitted, click on "Get Started/Launch" and you can view the purchased variant in "My Dashboard".
  • You will receive a successful purchase message on your registered email ID/mobile number.
Note: "Activate Now"/"Activate Code" is only applicable for institutional or bulk purchase.
    The course has been meticulously designed for both engineering students and working professionals, with more emphasis on hands-on experience and practical exposure to cutting edge tools and technologies. Learners will gain a comprehensive understanding about the essential building blocks and working principles of Data Engineering technologies with the help of industry and academic experts through live lectures and various hands-on tools used by the industry.
    In case a learner misses a live lecture, he/she will be provided access under Live Lecture learning component, to view the recorded version of the session within and for a specified time frame.
The learners would need the following infrastructure to access the learning platform:
    1. Device: A standard Desktop/Laptop/Tablet/Smartphone with camera and mic
    2. Internet: A regular broadband/Wi-Fi connection or a mobile 4G connection
    A learner can conveniently connect among the members at an appointed time for group work, outside the live classes. Learners can also use the platform 24x7 to engage and learn from their peers using our digital discussion rooms available under the course of the programme. Our platform also has the provision of group-specific discussion forums for offline discussion among the group members.
Yes, all eligible learners will get an opportunity to appear in 3 job interviews basis their TCS iON National Proficiency Test (TCS iON NPT) score.
    1. Learners can appear for TCS iON NPT as per the course topic and TCS iON NPT schedule.
    2. TCS iON NPT has to be completed within 3 months from completion of the course.
    There are minimum attendance requirements for the lectures. Also, all the participants will have to secure passing marks in the course to obtain a Certificate of Completion. If not, a Certificate of Participation will be awarded to the candidates.
Sr.NO Components Minimum Criteria Certificate Type
1 Live Lecture Attendance Greater Than 70% Certificate of Completion
2 Course Evaluation Greater Than 50% score Certificate of Completion

    If either of the above minimum criteria is not met, then a Certificate of Participation will be awarded.
    Please note that the course fees is not refundable under any circumstances. Also, the course fee is not transferable for any other course on the TCS iON platform or for any other purpose. We will be with you at every step for your upskilling and professional growth needs.