Where to Start with Cleaning Data

The Vice President of Customer Success Offers Advice for Clean Data in Retail

Ahead of the recent IHI Conference, 4R’s Vice President of Customer Success, Nina Chiavaroli participated in an Orgill sponsored panel discussion, “Where to Start with Cleaning Data.”  The hardware and home improvement sector is going through a digital transformation that is generating large volumes of data. Retailers can leverage this data to increase sales, create loyal customers, and ultimately build a successful business model for the future. Clean data is critical for data quality and accuracy, especially when leveraged for critical decision making, strategic planning, and measuring KPIs. This post recaps the important points from the Orgill panel discussion. 

Key Takeaways: 
  • What clean data means in retail  
  • Benefits of cleaning data 
  • Common data quality issues 
  • Steps for getting started with cleaning data and how to maintain it 
  • Data cleaning pitfalls to avoid 

What clean data means in retail  

Cleaning data is the process of identifying, correcting and/or removing duplicate, incomplete, corrupted, irrelevant, and incorrectly formatted data within a dataset so it’s consistent across different data sources. There are different forms of retail data, like external market data, customer-centric data, supply chain data, operation data and merchandising data. All this data can be used to make informed decisions about products, pricing and promotions to optimize supply chain, inventory management, merchandising strategies.  

Retail data comes from many sources and systems critical to a retailer’s business, including point-of-sale systems, ERP technologies, e-commerce platforms, and external market data sources. However, the data in these systems are not always “clean,” especially when it is pulled together to be leveraged. Removing errors and inconsistencies is critical for accuracy and avoiding incorrect analysis, insights and decisions. 

Benefits of cleaning data 

Data insights are only as good as the data quality. Cleaning data creates high-quality data sets that drive accurate analysis. Clean data leads to: 

  • Valuable Insights: Predictive analytics, fed by clean data, can be used to anticipate changes in customer demand, helping retailers identify new opportunities for growth. Insights into customer behavior and future trends drive decisions about product selection, pricing, and marketing strategies. 
  • Enhanced decision-making: When accurate and relevant information is available, staff can do their jobs well. Accurate data builds trust and staff are more confident in their decisions and are unburdened to create innovative strategies that drive the business. 
  • Increased productivity: Organizing data so that it’s readily available to the staff that needs it most is one output of the clean data process. Clean data makes looking for information, collecting data, and manually producing reports unnecessary. Eliminating these time-consuming activities increases productivity. 
  • Operational efficiency: The process of cleansing data and maintaining it eliminates data silos and redundancies that can slow down operations.  Clean data also drives AI and automation which increases operational efficiency. 
  • Enhanced inventory management: Cleans data helps retailers anticipate demand for certain products and order accordingly with greater accuracy. This helps to prevent overstocking or understocking inventory, which can lead to lost sales or excess inventory costs.  

Data cleaning and standardization improves the quality and reliability of data, leading to better outcomes and more efficient operations. 

Common data quality issues  

To go from mess to success, there are tools, like the 4R data validation engine to identify data problems and services to assist in rectifying them. Common errors and issues that cause messy data include: 

  • Data attribute inconsistencies – (example, size attributes: Medium, M, MD, Med, M 6-8) 
  • Duplicate data (from overlaps in data sources) 
  • Data errors due to human, manual action 
  • Missing or incomplete data which leads to false reports 
  • Outdated or obsolete data 
  • Outliers 

There are data cleaning tools available that aim to reduce or eliminate data issues, offering high-quality data sets and accurate reporting.  Data cleaning should always be a top priority for retailers to prevent errors that will negatively impact business activities. 

Steps for getting started with cleaning data and how to maintain it 

Data cleaning is an active and continuous process to maintain accurate, reliable and useful data.  

1

Business Discovery and Requirements

Before beginning the actual data cleaning process, retailers need to determine how they want to use the data. What are the most important business questions data can answer? What are the key metrics? Clean data can drive invaluable insights on different aspects of the retailer’s business, including customers, products, supply chain, third-party vendor, store operations, and more.  

This business discovery and data requirements step is put in place to identify and prioritize data sets for cleaning. The business discovery documentation helps everyone understand what is normal and what to look out for in a system, like supply chain management and inventory optimization solutions.  

2

Identifying the Data Errors and Issues 

Once the data sets required are identified, it is important to identify the errors. There are data validation tools that put the data from different sources through a series of standard checks and bounds. A validation engine, like the one provided by 4R, points out issues, outliers, general trends, mins and maxes and so on.  Comparing this information to the business discovery and requirements documentation ensures the data meets the expectations of the end-user. 

3

Fixing Data Errors and Issues 

Detected data errors need to be corrected. For example, fixing attribute inconsistencies, like the size medium. Medium can be labeled as M, MED, MD, or spelled out. The inconsistency will cause issues when it comes to assortment optimization, because and inventory management system will need to know that M, MED, MD and medium are all the same so that the attribute clustering has a robust set of data.  Rectifying such data errors creates an accurate and reliable data set that optimizes planning and forecasting.  

4

Continuously Monitor to Maintain Quality

Clean data, along with AI and Machine Learning are the future. They will accelerate the turnaround of validation processes and data insights. This does not mean the human element will be eliminated. AI and Machine Learning are only as good as the data going in, so humans are still needed to monitor the systems for bad data, errors and issues. 

Data Cleaning Pitfalls to Avoid 

A manual data cleaning process is not the most effective or error-preventative method, especially for large volumes of data and multiple sources of data.

Consider a data cleansing platform to drive accurate and complete datasets ready for data analysis, reporting or predictive analytics. 

A big misconception some retailers have about data cleaning is that it takes too long to do before it is useful. That couldn’t be further from the truth. The clean data process is an iterative one. Data does not have to be perfect before it is beneficial. The iterative process helps teams learn more about the business and where to focus. Clients often see the benefits immediately after the first iteration.

When it comes to clean data, collaboration is key. It is important among larger companies especially because they have people in positions that can be very specific and siloed. Smaller companies typically have people wearing multiple hats. Regardless of the size, retailer leaders and staff should have knowledge of the business, understanding of how data will be used, where it is sourced and how to source it.   

Collaborating ensures the objective/task that the data you are asking for or reporting on will accomplish its intended goal. Without collaboration, there is the chance that inconsistencies, issues and anomalies could go unnoticed. Teams that collaborate among themselves or with a data cleansing service to address those issues will iterate and create an even better data management process. 

4R is here to help you start the clean data process.  

As Nina communicated to the “Where to Start with Cleaning Data” panel audience, 4R recommends beginning the data cleaning process with understanding how retailers want to use the data to measure performance against business goals. Retailers can begin the cleansing process so that it meets optimal standards when data and data sets are identified and purposes are defined. The 4R One Predictive Engine sets a standard of what the data must look like and can validate the data to identify errors and issues.  

Our Planning-as-a-Service team collaborates with clients to make sure the data meets their expectations. During normal production, checks and bounds are in place to flag if the data begins to deviate significantly from expectations. During regular check-ins with our clients, we review and summarize the data and key KPIs to ensure no key information is missed. 

If you’re about to embark on a data cleansing journey, as part of an inventory and supply chain management solutions implementation project, contact us. We can help you take that first step towards success and partner with you to continuously monitor and clean your data for more accurate demand forecasts, clearer supply chain visibility, and optimized inventory planning. Let us help you get from mess to success.