
Insurance
Modernizing product data ingestion and distribution

Client
A large NY-based life insurance and investment company
Goal
Create a secure, automated solution for data ingestion and a robust framework for distribution across channels
Tools and Technologies
Python, PySpark, AWS Glue/Redshift/Lambda/S3/Aurora, StoneBranch, Jira, Github
Business Challenge
The client used a legacy product data infrastructure (PACE) and other systems that provided neither fully-secure access nor enabled efficient quality checks. This affected system integration and data ingestion and distribution.
Workflows and checks were not adequately automated, and they did not offer a reusable framework to generate and deliver outbound data files aligned with business requirements.

Solution
- Created reusable and scalable ETL/ELT pipelines using Python and AWS services
- Integrated Stonebranch for orchestration and automated job scheduling, with monitoring mechanisms and alerts
- Tuned Redshift queries and optimized data ingestion processes to reduce latency and improve throughput
- Defined data specifications and output formats as per business needs
- Built a configurable pipeline to create dynamic CSV/Excel files from Redshift views
- Automated file delivery via email/SFTP monitored and orchestrated by StoneBranch

Outcomes
- Improved data distribution and a reusable framework for ingestion and distribution of data across existing and new products
- Streamlined operations and improved data accessibility
- Enhanced performance and scalability
- Ensured better data quality and governance with automation and structured reusability
