Top 10 Data Catalogues

from | 22 September 2022 | Basics

If you have data, you also have data about data. This so-called metadata should also be collected, structured and stored. But why? Anyone who operates several databases or even several database systems knows that it is easy to lose track of the actual meaning, calculation or use of column X in table Y in database Z. This is where a data catalogue comes into play: metadata, i.e. information about the meaning, calculation, administration and access rights, origin (data lineage) as well as the status of the data can be stored and documented in it. Similar to a real catalogue, the various attributes of data (analogous to products in a catalogue) can be viewed and in some cases even data shopping can be done to get the actual data. We show 10 data catalogues that can be used in the company to collect, manage and effectively redistribute this metadata as part of a data strategy.

Collibra Data Governance Center

Collibra Data Catalog is one of the biggest players in the data governance market and offers a platform for data lineage, governance and privacy. With extensive native connectivity, different data sources can be registered and third-party tools for data science, reporting or BI can be integrated. In addition, the Collibra platform integrates a data marketplace for master data available to users. Collibra's rule and policy management enables the establishment of specific user roles and responsibilities, such as data stewardship, to ensure complete and qualitative metadata.

Informatica Enterprise Data Catalog

Informatica Enterprise Information Catalog provides a machine learning-based discovery engine to capture data assets across the enterprise and improve understanding of those assets through a graph-based enterprise information catalogue. Built on Informatica's metadata services engine of the same name, it enables business users as well as experts to find data across the enterprise and identify relationships between it. Furthermore, data can be enriched with business glossary terms and crowdsourced annotations to identify their meaning and calculation. In addition, data quality and lineage features enable users to understand the provenance, quality and use of their data. Out-of-the-box, Informatica Enterprise Catalog connects to many popular cloud, on-premises and middleware applications for automated analysis.

Alation Data Catalog

Alation offers a somewhat differently oriented data catalogue with the Data Catalog: With a focus on usage statistics to promote the activity and timeliness of the catalogue, the data is to be kept up-to-date. Furthermore, Alation follows a "best of breed" approach, which is why almost any database, cloud service and analytics application can be connected with countless connectors. With a high degree of individual customisability and an intuitive interface, the catalogue can also be easily used and adapted by business users.

Azure Data Catalog

With the Azure Data Catalog, Microsoft offers an enterprise-wide metadata catalogue that makes searching for data assets simple and direct. It is a managed service that enables users such as analysts, data scientists and data engineers to register, enhance, find, understand and search information sources. The Azure Data Catalog integrates with existing tools via open REST APIs, making it technology agnostic. It sheds light on the company's "dark data" so that less time can be spent searching for data and more time can be spent using it.

Oracle Enterprise Metadata Management

The Oracle Enterprise Metadata Management (OEMM) Platform addresses the increasing demand for lifecycle change management, data standardisation and compliance, and data governance requirements of various applications in communications, health sciences, public sector, retail, utilities and financial services. It enables interactive search and browsing of metadata. It also provides the ability to perform data provenance, impact analysis, semantic definition and semantic usage analysis for each metadata asset within the catalogue.

IBM Watson Knowledge Catalog

The IBM Watson Knowledge Catalog can be used to leverage business-ready data in combination with intelligent cataloguing and interactive policy management. Organisations can create a common foundation for a business governance glossary and further customise the catalogue to meet individual needs for better understanding of metadata. In addition, role-based access control, active policy monitoring techniques and sensitive data masking protocols help users protect their data and promote compliance policies. Intelligent recommendations offered by IBM Watson facilitate advanced discovery of critical assets as needed. Self-service insights can also be used to create customised dashboards for data quality and policy compliance analysis.

Google Dataplex Data Catalog

Google Dataplex combines both the creation and centralisation of a data mesh structure and the associated collection and management of metadata. With integrated data intelligence, data assets can be recognised and automatically integrated into the catalogue with regard to their origin. Data quality and data life cycle can be managed quite easily and data can be divided into logical domain-specific zones, for example. With a search function, external and business users can also quickly find the requested data and thus reach their goal faster.

Synabi D-Quantum

Synabi's D-Quantum technology provides an open platform for cataloguing data and can be compared to a Swiss army knife. For those familiar with Confluence and Wikipedia, the tool's similar look and feel makes it easy to get started. Synabi can also be used to display version histories of metadata: By directly comparing versions, changes can be visually highlighted and are easier to recognise. Furthermore, lineage functions enable the technical and business data origin to be displayed and allow the user to view further contextual information along the lineage, such as the data owner.

Dataspot

Dataspot, as a newcomer, has upgraded its product and, in addition to the well-known KPI catalogue, DQ catalogue and others, has now also integrated a Data Product Catalog. The tool's information can be accessed via dataspot Anywhere, an API that enables real-time access to metadata. A key focus of dataspot is the Business Lineage, which can also be used to visualise other contextual information. Metadata can also be imported and exported to other data formats (e.g. Excel) to facilitate maintenance.

Zeenea Data Catalog

Zeenea helps companies accelerate their data initiatives: The cloud-based platform provides a reliable and understandable database with maximum simplicity and automation. With just a few clicks, information and metadata can be found, discovered, managed and changed within the company. Zeenea differentiates between data users and administrators of the catalogue with two user experiences: Zeenea Studio, for data management teams; Zeenea Explorer, to provide catalogue users with a simplified search and browsing experience.

Author

Luke Lux

Lukas Lux is a working student in the Customer & Strategy department at Alexander Thamm GmbH. In addition to his studies in Sales Engineering & Product Management with a focus on IT Engineering, he is concerned with the latest trends and technologies in the field of Data & AI and compiles them for you in cooperation with our [at]experts.

0 Kommentare