3 data quality management tools to prevent “garbage in, garbage out”
by Nick Jordan, on March 4, 2020
Data quality is a hot topic with data practitioners. Here's how three companies are tackling the problem and what it means to you.
Anyone that uses data in their job has heard the saying “garbage in, garbage out.” Put another way, your results are only as good as the data that you use to generate them. This is true whether you’re using Excel for business intelligence or creating machine learning and artificial intelligence models.
The “garbage in, garbage out” mantra has led to an increased focus on data quality within the enterprise. The topic of data quality is complex. There is no silver bullet when it comes to ensuring that you have access to high quality data.
A number of companies are now focusing on data quality management. We’ll explore three of them here to better understand their perspective and their solutions.
Claravine
Claravine is a startup based in Utah that is taking a proactive approach to data quality. Their solution is focused on making sure that data is structured, standardized, and usable from the get-go by creating a standard taxonomy for marketing teams.
Much of the data collected within the enterprise is generated by different systems. Unfortunately, there is no way to enforce a common set of "tags" across different channels, teams, and digital experiences.
A simple example of this is Google Analytics (GA). The standard GA tag collects only very simple data about the page (URL, timestamp, IP, etc). GA knows little about page structure and where the user came from.
To overcome this, companies will employ a tagging strategy to pass additional details to Google Analytics. As a real world example, a retailer might pass in a product SKU or product category to GA. A paid marketing team might tag their landing page URLs letting GA know which campaign a visitor came from.
Problems of data consistency and data governance are caused by an organization that spans hundreds of people and dozens of teams.
A retailer might be running a Memorial Day campaign and they need to track the results across paid marketing channels. They may run search programs in-house, use a media agency for display campaigns, and a different agency to manage paid social. The retailer now has at least three teams deploying tags and creating potential data quality issues.
Without standards and business processes it is likely that the data across all of those sources will not line up. When the data isn’t consistent we’ve created “garbage in.”
Claravine’s solution is elegant as it acts as the platform of record for tagging standards. In our example, constituents from all three teams can use Claravine’s product to generate and deploy the proper tags. By having a centralized source of truth, consistency and data integrity become trivial.
Neutronian
Neutronian is a San Francisco-based company focused on bringing trust and transparency to marketing through data quality and compliance verification. Their solution includes a data certification and scoring process that can be thought of as the FICO Score for data quality.
Neutronian takes a comprehensive approach to their definition of data quality. They know that accuracy and utility alone don’t define data quality. Neutronian believes that data quality includes understanding collection methodology, privacy/consent, and the techniques used in modeling the data.
Their solution includes a deep dive into all aspects of the data supply chain. Neutronian works with suppliers to document a number of different factors including:
- Compliance and regulatory processes
- Data sourcing and provenance
- Data quality control
- Modeling methodology
- Performance characteristics
By understanding each of these dimensions, Neutronian can score and offer certification to suppliers as to the quality of their data. In addition to their comprehensive audit, they also provide ongoing monitoring in order to help detect any variance from certified standards. Buyers of data then have a precise understanding of what they are getting and confidence that it meets their quality standards.
Truth{set}
Truth{set} is a data validation platform, focused exclusively on scoring the accuracy and compliance of people based data.
Truth{set} helps data buyers discover, evaluate, and activate data with accuracy and compliance thresholds. They allow data sellers to evaluate, optimize, and monetize their data by focusing on quality. Together, they enable transparency and trust in data, powering efficient marketplaces for data.
Truth{set} establishes “Truthscores” which are numerical scores between 0.00 and 1.00. Truthscores denote the accuracy of identity pairs and people based attributes.
Truthscores are the product of the wisdom of the crowd algorithms that run across what they call their Truth Partner Network. Truth{set} benefits from access to a diverse array of data owners. In addition, Truth{set} leverages independent truth sets to train and validate Truthscore outputs.
This leads to higher quality data, increased trust, and compliance in the data supply chain. Truth{set} helps power use cases including marketing, commerce, compliance, and fraud.
Narrative’s take
At Narrative, we believe that data quality is table stakes for a well executed data strategy. Data analysts, data scientists, and data engineers are only as good as their data.
Our goal from the start has been to create an open and transparent ecosystem. We’re excited to have companies that bring unique approaches to data quality to be part of that ecosystem. We welcome solutions being built by companies like Claravine, Neutronian, and Truth{set} to ensure the vitality of the data economy.
Summary
Data quality will continue to be top of mind for data practitioners. There exist a number of solutions to help folks ensure that they aren’t putting “garbage in,” avoiding getting “garbage out.” We’ve explored a variety of solutions to the problem including those that:
- Focus on standardizing data collection
- Offer data benchmarking for existing data sets
- That offer data certification programs
There isn’t a single solution for avoiding data quality problems. We feel that if data practitioners are well informed, they can choose the tools that are right for their specific needs.
Interested in learning more about how Narrative enables high-quality data acquisition? Schedule a chat with a member of our team.