Editorial Summary :

Machine learning (ML) has evolved from being a fashionable trend emerging from academic environments and innovation departments to becoming a key means to deliver value across businesses in every industry . At Cepsa, a global energy company, we use ML to tackle complex problems across our lines of businesses . We didn’t have a reference architecture for ML, so each project followed a different implementation path, performing ad hoc model training and deployment . Without a common method to handle project code and parameters and without an ML model registry or versioning system, we lost the traceability amongst datasets, code, and models . In Cepsa, we use a series of data lakes to cover diverse business needs . Data lakes share a common data consumption model that makes it easier for data engineers and data scientists to find and consume the data they need . Data lake environments are completely separated from data producer and consumer applications, and deployed in different AWS accounts belonging to a common AWS Organization . The training process is independent for each model and handled by a Step Functions standard workflow, which gives us flexibility to model processes based on different project requirements . The YET Dragon project aims to improve the production performance of Cepsa’s petrochemical plant in Shanghai . The resulting model is stored in Amazon S3, a reference is added in the model registry, and all the collected information and metrics are saved in the experiments catalog . Storing every experiment and model version in a single location and having a centralized code repository enables us to decouple model training and deployment . The architecture is flexible and allows both automatic and manual deployments of the trained models . YET Dragon was our first ML optimization project to feature a model registry, full reproducibility of the experiments, and a fully managed automated training process . We also built an optimizer to process the results of the four GAM models and find the best optimization that could be applied in the plant . We plan to keep launching new YET projects based on this architecture, which has decreased project mean duration by 25% thanks to the reduction of bootstrapping time and the automation of ML pipelines . The short-term evolution of this MLOps architecture is towards model monitoring and automated testing . Guillermo Ribeiro Jiménez is a Sr Data Scientist at Cepsa with a PhD in Nuclear Physics with a 6-year experience with data science projects, mainly in the telco and energy industry . He has over 15 years of experience designing and building SW applications, and currently provides architectural guidance to AWS customers in the energy industry, with a focus on analytics and machine learning . For more information about MLOps on SageMaker, visit Amazon SageMaker for MLOps and check out other customer use cases in the AWS Machine Learning Blog .

Key Highlights :

  • Machine learning (ML) has evolved from being a fashionable trend emerging from academic environments and innovation departments to becoming a key means to deliver value across businesses in every industry .
  • Cepsa uses a series of data lakes to cover diverse business needs .
  • Data lakes are separated from data producer and consumer applications .
  • Data is made available from the different data lakes through a set of well-defined APIs .
  • The YET Dragon project aims to improve the production performance of Cepsa’s petrochemical plant in Shanghai .
  • Cepsa has developed a standardized MLOps architecture that has been adopted by different business across the company .
  • Guillermo Ribeiro Jiménez is a Sr Data Scientist at Cepsa with a PhD in Nuclear Physics .
  • Guillermos Menéndez Corral is a Solutions Architect at AWS Energy and Utilities .

The editorial is based on the content sourced from aws.amazon.com

Read the full article.

Similar Posts