CRM is the lifeline of an organization. It is a treasure map to the total addressable market size that lies waiting to be grabbed or has already been explored. Mergers, acquisitions or for that matter even rapidly expanding companies end up having multiple CRMs with duplicate and outdated records. Duplicate records are misleading and provide an erroneous picture of the total addressable market. Outdated records prohibit sales team to realize its potential and meet targets.
OceanFrogs embarked on one such challenging mission recently. No one likes airing their dirty CRMs in public. So one day one of our illustrious clients
approached us with one burning question.
We gladly said “yes” only to realize that we had walked ourselves into not one but four CRM systems with outdated records. These CRMs were not the ideal world for data scientists where one can plug and play ML algorithms. This was a classic problem statement for data enrichment experts.
We hypothesized that a combination of customer name and street address must be unique for us to label it as a unique customer.
The smug Data Scientists in us said – so what? We can nail down any problem. However, things did not go that way.
Our analytical minds cried, were frustrated, threw tantrums for many days. None of that solved the problem. Finally, we got back to the basics.
The first rule of thumb is to get rid of assumptions or hypothesis.
First hypothesis: Customer Name has to be similar. We said: forget it.
Second hypothesis: Customer and Street Name is a good combination to find unique records. Forget that as well. It did not work either.
Third hypothesis: Domain name should be the same.
Fourth hypothesis: Telephone numbers must be the same.
We threw various matching algorithms through SQL and Python packages. Finally, we saw the light at the end of the tunnel.
Similar email addresses indicate chances of customers being the same. This hypothesis observed accuracy. It encouraged us to try another hypothesis. Exact phone numbers indicate the chances of two customers being the same. We felt AI confident venturing into one more hypothesis. We observed a high accuracy of customers being the same when the exact street address and house number matched. The stage was set. Confidence was restored. Data warriors had come back. Common sense had prevailed. We needed to respect the first rule of the game.
We compared records by email address, telephone, street address but not by name. Only thinking outside the box came to our rescue. This project shook the core belief system that one needs to attempt solving a problem in an organized manner. Going disruptive was our only way to find potential matches and restore sanity back to the CRMs.