Unfortunately, as is the case with most data sets, an important part of customer activity profiling is data cleansing (or data scrubbing). More often than not customers are referenced by many names and aliases. For example, a common set of representations for Wal-Mart is WM, WMart, W-Mart, W-M , W_Mart, Wal_Mart, WMrt, etc. In addition, bill-to vs. ship-to locations and customers need to be carefully distinguished to insure that an accurate representation of customer activity is portrayed.


The ultimate goal of any data cleansing initiative is high data integrity or high data quality. High quality data sets are characterized by completeness, accuracy, consistency, de-duplification, and uniformity. Clean data is contrasted with dirty data.


Customer data is continuously cleansed to insure that truly unique, consistent, and accurate references to customers bill-tos, and customer ship-tos are maintained.