This post is co-written by Dr. Leonard Heilig and Meliena Zlotos from EUROGATE.
For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Internally, making data accessible and fostering cross-departmental processing through advanced analytics and data science enhances information use and decision-making, leading to better resource allocation, reduced bottlenecks, and improved operational performance. Externally, sharing real-time data with partners such as shipping lines, trucking companies, and customs agencies fosters better coordination, visibility, and faster decision-making across the logistics chain. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.
EUROGATE is a leading independent container terminal operator in Europe, known for its reliable and professional container handling services. Every day, EUROGATE handles thousands of freight containers moving in and out of ports as part of global supply chains. Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).
In this post, we show you how EUROGATE uses AWS services, including Amazon DataZone, to make data discoverable by data consumers across different business units so that they can innovate faster. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker. We encourage you to read Amazon DataZone concepts and terminology to become familiar with the terms used in this post.
Data landscape in EUROGATE and current challenges faced in data governance
The EUROGATE Group is a conglomerate of container terminals and service providers, providing container handling, intermodal transports, maintenance and repair, and seaworthy packaging services. In recent years, EUROGATE has made significant investments in modern cloud applications to enhance its operations and services along the logistics chains. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. However, much of this data remains siloed and making it accessible for different purposes and other departments remains complex. Thus, managing data at scale and establishing data-driven decision support across different companies and departments within the EUROGATE Group remains a challenge.
Need for a data mesh architecture
Because entities in the EUROGATE group generate vast amounts of data from various sources—across departments, locations, and technologies—the traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability. The following requirements were essential to decide for adopting a modern data mesh architecture:
- Domain-oriented ownership and data-as-a-product: EUROGATE aims to:
- Enable scalable and straightforward data sharing across organizational boundaries.
- Enhance agility by localizing changes within business domains and clear data contracts.
- Improve accuracy and resiliency of analytics and machine learning by fostering data standards and high-quality data products.
- Eliminate centralized bottlenecks and complex data pipelines.
- Self-service and data governance: EUROGATE wants to ensure that the discovery, access, and use of data by consumers is as direct as possible through a data portal where information about shared data sets can be published, while data governance is streamlined through automated policy enforcement, ensuring compliance during key stages such as data discovery, access, and deployment.
- Plug-and-play integration: A seamless, plug-and-play integration between data producers and consumers should facilitate rapid use of new data sets and enable quick proof of concepts, such as in the data science teams.
How Amazon DataZone helped EUROGATE address those challenges
In the first phase of establishing a data mesh, EUROGATE focused on standardized processes to allow data producers to share data in Amazon DataZone and to allow data consumers to discover and access data. The vision, as shown in the following figure, is that data from digital services, such as from the terminal operating system (TOS) and TwinSim (a project to create a digital twin of real-world operations), can be shared with Amazon DataZone and used by BI dashboards and data science teams, among others, while those digital services and other domain users can also consume subscribed data from Amazon DataZone.
In the following section, two use cases demonstrate how the data mesh is established with Amazon DataZone to better facilitate machine learning for an IoT-based digital twin and BI dashboards and reporting using Tableau.
Use case 1: Machine learning for IoT-based digital twin
Through the TwinSim project, EUROGATE has developed a digital twin using AWS services that gathers real-time data (for example, positions, machinery, and pick/deck events) from CHE (including straddle carriers and quay cranes), integrates it with planning data from the TOS, and enhances it with additional sources such as weather information. In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. EUROGATE’s data science team aims to create machine learning models that integrate key data sources from various AWS accounts, allowing for training and deployment across different container terminals. To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account.
As part of the required data, CHE data is shared using Amazon DataZone. The data originates in Amazon Kinesis Data Streams, from which it is copied to a dedicated Amazon Simple Storage Service (Amazon S3) bucket by using Amazon Data Firehose in combination with an AWS Lambda function for data filtering. An extract, transform, and load (ETL) process using AWS Glue is triggered once a day to extract the required data and transform it into the required format and quality, following the data product principle of data mesh architectures. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog. This process is shown in the following figure.
To work with the shared data, the data science and AI teams subscribe to the data and query it using Amazon Athena by using Amazon SageMaker Data Wrangler. The following is an example query.
A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone.
With this, as the data lands in the curated data lake (Amazon S3 in parquet format) in the producer account, the data science and AI teams gain instant access to the source data eliminating traditional delays in the data availability. The data science and AI teams are able to explore and use new data sources as they become available through Amazon DataZone. Because Amazon DataZone integrates the data quality results, by subscribing to the data from Amazon DataZone, the teams can make sure that the data product meets consistent quality standards.
After experimentation, the data science teams can share their assets and publish their models to an Amazon DataZone business catalog using the integration between Amazon SageMaker and Amazon DataZone. This will be the future use case of EUROGATE where the ability to publish trained machine learning (ML) models back to an Amazon DataZone catalog promotes reusability, allowing models to be discovered by other teams and projects. This approach fosters knowledge sharing across the ML lifecycle.
Use case 2: BI for cloud applications
In recent years, EUROGATE has developed several cloud applications for supporting key container logistics processes and services, such as special container terminal and container depot applications or digital platforms for organizing container transports using rail and truck. The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau. In the past, one-to-one connections were established between Tableau and respective applications. This led to a complex and slow computations. In this use case, EUROGATE implemented a hybrid data mesh architecture using Amazon Redshift as a centralized data platform. This approach transformed their fragmented Tableau connections into a scalable, efficient analytics ecosystem.
By centralizing container and logistics application data through Amazon Redshift and establishing a governance framework with Amazon DataZone, EUROGATE achieved both performance optimization and cost efficiency. The hybrid data mesh enables batch processing at scale while maintaining the data access controls, security, and governance; effectively balancing the distributed ownership with centralized analytics capabilities.
The data is shared from on-premises to an Amazon Relational Database Service (Amazon RDS) database in the AWS Cloud. AWS Database Migration Service (AWS DMS) is used to securely transfer the relevant data to a central Amazon Redshift cluster. AWS DMS tasks are orchestrated using AWS Step Functions. A Step Functions state machine is run on a daily using Amazon EventBridge scheduler. The data in the central data warehouse in Amazon Redshift is then processed for analytical needs and the metadata is shared to the consumers through Amazon DataZone. The consumer subscribes to the data product from Amazon DataZone and consumes the data with their own Amazon Redshift instance. This is further integrated into Tableau dashboards. The architecture is depicted in the following figure.
Implementation benefits
As we continue to scale, efficient and seamless data sharing across services and applications becomes increasingly important. By using Amazon DataZone and other AWS services including Amazon Redshift and Amazon SageMaker, we can achieve a secure, streamlined, and scalable solution for data and ML model management, fostering effective collaboration and generating valuable insights. This approach supports both the immediate needs of visualization tools such as Tableau and the long-term demands of digital twin and IoT data analytics.
- Centralized, scalable data sharing and native integration
Amazon DataZone facilitates integration with applications such as Tableau, enabling data to flow seamlessly within the AWS ecosystem. Those integrations reduce the need for complex, manual configurations, allowing EUROGATE to share data across the organization efficiently. The architecture centralizes key data, such as CHE data, for analytics and ML, ensuring that teams across the organization have access to consistent, up-to-date information, enhancing collaboration and decision-making at all levels. Insights from ML models can be channeled through Amazon DataZone to inform internal key decision makers internally and external partners.
- Reduced complexity, greater scalability, and cost efficiency
The Amazon DataZone architecture reduces unnecessary complexity and scales with EUROGATE’s growing needs, whether through new data sources or increased user demand. In parallel, using Amazon Data Firehose to stream data into an S3 bucket and AWS Glue for daily ETL transformations provides an automated pipeline that prepares the data for long-term analytics. This batch-oriented approach reduces computational overhead and associated costs, allowing resources to be allocated efficiently. While real-time data is processed by other applications, this setup maintains high-performance analytics without the expense of continuous processing.
- Faster and easier data integration for Tableau and enhanced data preparation for ML
Amazon DataZone streamlines data integration for tools such as Tableau, enabling BI teams to quickly add and visualize data without building complex pipelines. This agility accelerates EUROGATE’s insight generation, keeping decision-making aligned with current data. Additionally, daily ETL transformations through AWS Glue ensure high-quality, structured data for ML, enabling efficient model training and predictive analytics. This combination of ease and depth in data management equips EUROGATE to support both rapid BI needs and robust analytical processing for IoT and digital twin projects.
- Faster onboarding and data sharing of data assets between organizational units
Amazon DataZone helps the teams to autonomously discover data assets that are created in the organization and to onboard data assets across AWS accounts within minutes with metadata synchronization. EUROGATE has already onboarded 500 data assets from different organizational units using Amazon DataZone. The new process of onboarding data assets is 15 times faster, leading to immediate visibility of data assets while simplifying data sharing and discovery through an intuitive point-and-click interface that removes traditional barriers to data access.
Conclusion
The implementation of Amazon DataZone marks a transformative step for EUROGATE’s data management by providing a scalable, and efficient solution for data sharing, machine learning and analytics. By integrating various data producers and connecting them to data consumers such as Amazon SageMaker and Tableau, Amazon DataZone functions as a digital library to streamline data sharing and integration across EUROGATE’s operations. In the first phase of production, Amazon DataZone has already demonstrated measurable benefits, including access to data and ML and the ability to incorporate a wider range of datasets to its unified catalog repository. By centralizing metadata with Amazon DataZone, EUROGATE is setting a solid foundation for efficient operations and improved data and ML governance, because teams can now discover, govern, and analyze data with greater confidence and speed. This capability supports rapid responses to business needs, helping EUROGATE to maintain agility and stay ahead of the curve. With this, EUROGATE is better positioned to onboard new data sources, integrate additional terminals, and expand machine learning applications across our container terminals.
Amazon DataZone empowers EUROGATE by setting the stage for long-term operational excellence and scalability. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, we’re laying the groundwork for future growth. This infrastructure enables EUROGATE to extract predictive insights, drive smarter business decisions, and scale operations efficiently, ultimately supporting our goal of sustained innovation and competitive advantage.
Future vision and next steps
As EUROGATE continues to advance its digital transformation, the integration of Amazon DataZone and EUROGATE’s architecture lays the groundwork for a more data-driven and intelligent future. In the upcoming phases, the vision is to further expand the role of Amazon DataZone as the central platform for all data management, enabling seamless integration across an even broader set of data sources and consumers. This will include additional data from more container terminals and logistics service providers, enhanced operational metrics, IoT sensor data, and advanced third-party sources such as global supply chain data and maritime analytics.
The continued focus on secure data sharing and governance will also foster better collaboration with partners, suppliers, and customers, leading to improved service levels and a more resilient supply chain. This future vision will help EUROGATE maintain its position as a leader in container terminal operations while continuously adapting to technological advancements and market dynamics.
Ultimately, EUROGATE’s investment in this architecture ensures that the organization is well-positioned to scale and innovate in a dynamic industry through a future of smarter, more connected, and highly efficient container terminal operations.
To learn more about Amazon DataZone and how to get started, see the Getting started guide. See the YouTube playlist for some of the latest demos of Amazon DataZone and short descriptions of the capabilities available.
About the Authors
Dr. Leonard Heilig is CTO at driveMybox and drives digitalization and AI initiatives at EUROGATE, bringing over 10 years of research and industry experience in cloud-based platform development, data management, and AI. Combining a deep understanding of advanced technologies with a passion for innovation, Leonard is dedicated to transforming logistics processes through digitalization and AI-driven solutions.
Meliena Zlotos is a DevOps Engineer at EUROGATE with a background in Industrial Engineering. She has been heavily involved in the Data Sharing Project, focusing on the implementation of Amazon DataZone into EUROGATE’s IT environment. Through this project, Meliena has gained valuable experience and insights into DataZone and Data Engineering, contributing to the successful integration and optimization of data management solutions within the organization.
Lakshmi Nair is a Senior Specialist Solutions Architect for Data Analytics at AWS. She focuses on architecting solutions for organizations across their end-to-end data analytics estate, including batch and real-time streaming, data governance, big data, data warehousing, and data lake workloads. She can reached via LinkedIn.
Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.