2020 Blog, Analytics, Blog, Command blog, Featured
Intelligent Automation – Workflow Monitoring for Data Pipelines
Automation with simple scripts is relatively easy, but complexity creeps in to solve real-world production-grade solutions. A compelling use case was shared with us by our large Financial Asset management customer. They deal with this customer who provides a large number of properties & financial data feeds with multiple data formats coming in different frequencies ranging from daily, weekly, monthly and ad-hoc. The customer business model is driven based on Data processing on these feeds and creating “data-pipelines” for ingestion, cleansing, aggregation, analysis, and decisions from their Enterprise Data Lake.
The current ecosystem of customer comprises multiple ETL Jobs, which connects to various internal, external systems and converts into a Data Lake for further data processing. The complexity was enormous as the volume of data was high and lead to high chances of failures and indeed required continuous human interventions and monitoring of these jobs. Support teams receive a notification through emails when a job is only completed successfully or on failure. Thus, the legacy system makes job monitoring and exception handling quite tricky. The following simple pictorial representation explains a typical daily Data Pipeline and associated challenges:
The legacy solution has multiple custom scripts implemented in Shell, Python, Powershell that would make a call to Azure Data Factory via an API call to run a pipeline. Each independent task had its complexities, and there was a lack of an end to end view with real-time monitoring and error diagnostics.
A new workflow model was developed using the RLCatalyst workflow monitoring component, (using YAML definitions) and the existing customer scripts were converted to RLCatalyst BOTs using a simple migration designer. Once loaded into RLCatalyst Command Centre, the solution provides a real-time and historical view with notifications to support teams for anomaly situations and ability to take auto-remediation steps based on configured rules.
We deployed the entire solution in just three weeks in the customer’s Azure environment along with migrating the existing scripts.
RLCatalyst Workflow Monitoring provides a simple and effective solution much different from the standard RPA tools. RPA deals with more End-User Processing workflows while RLCatalyst Workflow Monitoring is more relevant for Machine Data Processing Workflows and Jobs.
For more information feel free to contact marketing@relevancelab.com