Article contents
Metadata-Driven ETL Framework for Automated Schema Evolution and Impact Analysis
Abstract
Contemporary enterprise data systems encounter extraordinary obstacles when sustaining Extract, Transform, and Load operations throughout diverse data repositories. Schema modifications constitute a continuing challenge within modern data engineering, as changing organizational requirements constantly alter structural data configurations. Traditional schema change management methods depend extensively on manual processes, creating bottlenecks that delay critical business operations. This investigation introduces an innovative metadata-driven ETL framework addressing these obstacles through automated schema evolution detection and intelligent impact evaluation. The framework utilizes schema repositories and version monitoring systems to sustain detailed metadata catalogs, facilitating immediate identification of structural modifications throughout data repositories. The structural framework consists of four essential elements: Schema Registry Service, Change Detection Engine, Impact Analysis Module, and Pipeline Orchestration Layer. The implementation employs microservices design patterns operating on Microsoft Azure Kubernetes Service, incorporating Apache Spark for expandable data processing and Delta Lake for dependable data storage. Extensive testing throughout enterprise settings reveals outstanding results in automated schema change resolution, demonstrating considerable achievement rates for automatic management of schema drift situations without requiring manual oversight. The framework exhibits superior scalability characteristics through distributed architectural principles, enabling horizontal scaling across multiple processing nodes while maintaining sub-second response times for schema change identification and impact evaluation.
Article information
Journal
Journal of Computer Science and Technology Studies
Volume (Issue)
7 (7)
Pages
846-852
Published
Copyright
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.