Considerable effort has gone into software development practices and how they relate to data, but have we gone far enough in how we think about and use data? Data fabrics can help.
One approach for better data utilization is the data fabric, a data management approach that arranges data in a single “fabric” that spans multiple systems and endpoints. The goal of the fabric is to link all data so it can easily be accessed.
SEE: Electronic Data Disposal Policy (TechRepublic Premium)
“DataOps and data fabric are two different but related things,” said Ed Thompson, CTO at Matillion, which provides a cloud data integration platform. “DataOps is about taking practices which are common in modern software development and applying them to data projects. Data fabric is about the type of data landscape that you create and how the tools that you use work together.”
Thompson said the key to moving data from an exercise in interaction to a malleable fabric that can meet the needs of any application is the development of metadata, which is knowledge about the data itself. This knowledge must be able to be passed on from tool to tool.
“In a typical data stack, all the tools involved have additive knowledge about the core data,” Thompson said. “For example, a pipeline tool knows where the data came from. A data integration tool knows how it has been transformed. A quality tool can assess how trustworthy it is. An analytics tool knows how valuable it is to business users… and so on. If you want to get an end-to-end picture of the lifecycle of that data, you need to be able to see all that information from right across your data estate. This is what a data fabric gives you: a unified view of your consumption layer that can coordinate around the performance of data.”
SEE: Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)
Unfortunately, commercial data tools don’t always work together, and the industry still has a way to go to realize the full potential of this concept. Companies don’t do much with data fabrics, either, unless their IT is at an advanced state of technical maturity that includes the development of a data fabric. That makes it easy for corporate IT departments to just forgo data fabric work for now, but it still doesn’t mean that IT shouldn’t be thinking about it as a future direction.
3 steps for building a data fabric
If your company would like to move forward with building this useful tool, here are the steps to take:
- Start with DevOps. Most organizations are already using DevOps methodologies. While companies work with DevOps, they can begin building in data fabric thinking by choosing data ingestion tools that maximize the capture of metadata at every point, whether that metadata comes from structured or unstructured data. “You want to do this so you don’t lose the context of the original source,” Thompson said. Understanding the journey of data, as well as its transformations and its reliability, is crucial for applications and for security monitoring and observability as well.
- Start small and plan to grow. This includes getting tools in place that can address data ingestion; extract, transform and load functions; analytics; and end-to-end data quality testing.
- Start your data fabric build. “Add tools to help with cataloging data, centralizing the sharing of metadata, reporting on lineage, governance and handling of PII [personally identifiable information] data,” Thompson said. “Once this is in place, you are ready to democratize access to data across the organization. It also gives you the high-quality datasets you need to power machine learning technologies such as robotic process automation tools that can act on data automatically.”
Data, Analytics and AI Newsletter
Learn the latest news and best practices about data science, big data analytics, and artificial intelligence.
- Geospatial data is being used to help track pandemics and emergencies (TechRepublic)
- Akamai boosts traffic by 350% but keeps energy use flat thanks to edge computing (TechRepublic)
- How to become a data scientist: A cheat sheet (TechRepublic)
- Top 5 programming languages data admins should know (free PDF) (TechRepublic download)
- Data Encryption Policy (TechRepublic Premium)
- Volume, velocity, and variety: Understanding the three V’s of big data (ZDNet)
- Big data: More must-read coverage (TechRepublic on Flipboard)