Anyone who has data also has data about data. This so-called metadata should also be collected, structured, and stored. But why? Anyone who operates multiple databases or even multiple database systems knows that it is easy to lose track of the actual meaning, calculation, or use of column X in table Y in database Z. This is where a data catalog comes into play: it can be used to store and document metadata, i.e., information about the meaning, calculation, administration and access rights, origin (data lineage), and status of the data. Similar to a real catalog, the various attributes of data (analogous to products in a catalog) can be viewed and, in some cases, data shopping can even be used to obtain the actual data. We present 10 data catalogs that can be used to collect, manage, and effectively distribute this metadata within a company as part of a data strategy.
The Informatica Enterprise Information Catalog provides a machine learning-based discovery engine for capturing data assets across the enterprise and improving understanding of those data assets through a graph-based enterprise information catalog. It is based on Informatica's metadata services engine of the same name and enables business users and experts to find data across the enterprise and identify relationships between them. Furthermore, data can be enriched with business glossary terms and crowdsourced annotations to identify its meaning and calculation. In addition, data quality and lineage features enable the identification of data origin and quality, as well as an understanding of how the data is used. Out-of-the-box, the Informatica Enterprise Catalog can be connected to many popular cloud applications, local systems, and middleware applications and analyzed automatically.
Alation offers a slightly different approach to data cataloging with its Data Catalog: With a focus on usage statistics to promote the activity and timeliness of the catalog, the data is kept up to date. Alation also pursues a “best of breed” approach, which is why countless connectors can be used to connect to almost any database, cloud service, and analytics application. With a high degree of customizability and an intuitive interface, the catalog is easy for business users to use and adapt.
With Azure Data Catalog, Microsoft offers an enterprise-wide metadata catalog that makes searching for data sets easy and straightforward. It is a managed service that enables users such as analysts, data scientists, and data engineers to register, improve, find, understand, and search through information sources. Azure Data Catalog can be integrated into existing tools using open REST APIs and can therefore be used independently of technology. It sheds light on the company's “dark data” so that less time is spent searching for data and more time can be spent using it.
The Oracle Enterprise Metadata Management (OEMM) platform addresses the growing demand for lifecycle change management, data standardization and compliance, and data governance requirements for various applications in the communications, health sciences, public sector, retail, utilities, and financial services industries. It enables interactive search and browsing of metadata. In addition, it offers the ability to perform data provenance, impact analysis, semantic definition, and semantic usage analysis for each metadata asset within the catalog.
IBM Watson Knowledge Catalog can be used to leverage business-ready data in combination with intelligent cataloging and interactive policy management. Companies can create a common basis for a business governance glossary and also customize the catalog to individual requirements for a better understanding of metadata. In addition, role-based access control, active policy monitoring techniques, and sensitive data masking protocols help users protect their data and promote compliance policies. Intelligent recommendations provided by IBM Watson facilitate advanced detection of important assets as needed. Self-service insights can also be used to create custom dashboards for data quality and policy compliance analysis.
Google Dataplex combines the creation and centralization of a data mesh structure with the associated collection and management of metadata. With integrated data intelligence, data assets can be identified and automatically integrated into the catalog according to their origin. Data quality and data lifecycle can be managed quite easily, allowing data to be divided into logical domain-specific zones, for example. A search function also allows external and business users to quickly find the data they need and get to their destination faster.
Synabi's D-Quantum technology offers an open platform for cataloging data and can be compared to a Swiss Army knife. For those familiar with Confluence and Wikipedia, the tool's similar look and feel makes it easy to get started. Synabi can also be used to display metadata version histories: by directly comparing versions, changes can be visually highlighted and are easier to identify. Furthermore, lineage functions enable the technical and business data origin to be displayed and allow users to view additional contextual information along the lineage, such as the data owner.
Dataspot, a newcomer to the market, has upgraded its product and now integrates a Data Product Catalog in addition to the familiar KPI Catalog, DQ Catalog, and others. The information in the tool can be accessed via dataspot Anywhere, an API that enables real-time access to metadata. One of Dataspot's main focuses is on business lineage, which can also be used to visualize other contextual information. To facilitate metadata maintenance, metadata can also be imported and exported to other data formats (e.g., Excel).
Zeenea helps companies accelerate their data initiatives: The cloud-based platform offers a reliable and understandable database that is available with maximum simplicity and automation. With just a few clicks, information and metadata can be found, discovered, managed, and modified within the company. Zeenea distinguishes between data users and catalog administrators with two user experiences: Zeenea Studio, for data management teams; Zeenea Explorer, to provide catalog users with a simplified search and browsing experience.
Share this post: