If only we lived in a perfect harmonious world of CRMs…..

Let’s Take it from the Top !

CRM is the lifeline of an organization. It is a treasure map to the total addressable market size that lies waiting to be grabbed or has already been explored. Mergers, acquisitions or for that matter even rapidly expanding companies end up having multiple CRMs with duplicate and outdated records. Duplicate records are misleading and provide an erroneous picture of the total addressable market. Outdated records prohibit sales team to realize its potential and meet targets.

Bringing CRM to an ideal state is called

Every CRM must go through continuous harmonization or periodic ones to maintain a single version of the truth.

The Challenge

OceanFrogs embarked on one such challenging mission recently. No one likes airing their dirty CRMs in public. So one day one of our illustrious clients approached us with one burning question.

"Can you restore sanity to our CRM systems?"

We gladly said “yes” only to realize that we had walked ourselves into not one but four CRM systems with outdated records. These CRMs were not the ideal world for data scientists where one can plug and play ML algorithms. This was a classic problem statement for data enrichment experts.


An outdated CRM can have everything going wrong :
  1. Names don't match amongst systems
  2. Their Designations, locations, organizations are not updated.
  3. Email addresses, phone numbers are not valid.
  4. A customer expects data scientists to wave their new -age magic wand aka NLP and make it all nice and perfect.
We had all of those problems and then some more.

We hypothesized that a combination of customer name and street address must be unique for us to label it as a unique customer.

Did we mention the linguistic challenge?

Data was not in English….TADAA.. !!!

The smug Data Scientists in us said - so what? We can nail down any problem. However, things did not go that way.
Our analytical minds cried, were frustrated, threw tantrums for many days. None of that solved the problem. Finally, we got back to the basics.

Our Eureka Moment!

The first rule of thumb is to get rid of assumptions or hypothesis.

First hypothesis: Customer Name has to be similar. We said: forget it.
Second hypothesis: Customer and Street Name is a good combination to find unique records. Forget that as well. It did not work either.
Third hypothesis: Domain name should be the same.
Fourth hypothesis: Telephone numbers must be the same.


We threw various matching algorithms through SQL and Python packages. Finally, we saw the light at the end of the tunnel.

Our hypothesis:

Similar email addresses indicate chances of customers being the same. This hypothesis observed accuracy. It encouraged us to try another hypothesis. Exact phone numbers indicate the chances of two customers being the same. We felt AI confident venturing into one more hypothesis. We observed a high accuracy of customers being the same when the exact street address and house number matched. The stage was set. Confidence was restored. Data warriors had come back. Common sense had prevailed. We needed to respect the first rule of the game.

Do not take any data with the assumption that it is clean.


We compared records by email address, telephone, street address but not by name. Only thinking outside the box came to our rescue. This project shook the core belief system that one needs to attempt solving a problem in an organized manner. Going disruptive was our only way to find potential matches and restore sanity back to the CRMs.

The Technicality of it all…

  • Records from all the four systems were converted to one common language (English). This was accomplished using OceanFrogs internal language translator.
  • Incoming data were aggregated from all CRM systems. Each system was associated with a unique ID.
  • Exploratory Data Analysis was the next organic step. This helped in identifying the health of available fields. Feature engineering was extensively used to gauge how much value does a field provide.
  • Within-the-Systems duplicates were identified by comparing multiple attributes. Duplicates across systems were given accuracy scores with the help of OceanFrogs’s Natural Programming Modules.
  • OceanFrogs’ algorithms were trained for exact as well as partial match. The search functions heavily benefitted from Natural language processing and Pattern Recognition.

And CRM Harmonization...we got that monster tamed !!!

  • 20% of the customer records were categorized as duplicates with high accuracy.
  • 7 % of the customer records were identified as duplicates with medium accuracy.
  • Deployed an automated AI system that keeps checking to harmonize the inbound CRM data.
  • Assessed The Total Addressable Market