Regtech has developed at an increased rate in recent years to keep up with the ever growing number of regulations that have been implemented. However, one technology that has been woefully underused in the financial industry, including in the regtech space, is voice tech.
Nigel Cannings is the CTO at Intelligent Voice. Designed to take telephone calls, email, IM and other unstructured data and make it smart searchable, and identify unknown callers and show hidden links between data, whatever the source, Intelligent Voice has been focused on developing voice technology to solve problems that are currently being found in the financial industry. Cannings explains what can be done and why it is so relevant:

How voice technology could provide the answer to some of the financial industry’s biggest problems
The financial industry has been under intense scrutiny for most of the last decade. Regulations have tightened – necessarily so. And communications monitoring has become a crucial element of day-to-day compliance processes. It takes time, effort, and money to ensure that protocols are adhered to, and it’s easy for mistakes to be overlooked – in 2020 banks were fined over $14.2billion globally for non-compliance. Voice technology holds the potential to provide a solution to some of the most pressing compliance issues.
Know your customer (KYC). Anti-money laundering (AML). The General Data Protection Regulation (GDPR). Payment Card Industry Data Security Standard (PCI DSS). Markets in Financial Instruments Directive II (MiFID II). The alphabet soup of regulations that financial services organisations have to adhere to has increased dramatically in recent years and the degree of governance and compliance monitoring relating to each has grown correspondingly. There is enormous pressure for those operating within the sector to understand, execute, and observe adherence to the requisite policies and it’s a vast undertaking, which can be difficult to manage. That’s why Regulation Technology (RegTech) has become such big business and why voice recording has become a crucial component of the ecosystem, especially as the requirements for long-term storage of voice data have expanded.
What is RegTech?
Regtech is the umbrella term used to describe any compliance technology designed to help businesses to stay compliant with regulatory obligations. The available tools range from data gathering and analysis, to process automation, and risk management – including fraud prevention. It is used to not only enhance compliance but to reduce the costs associated with effective governance practices.
The key to RegTech is that it goes beyond many of the standard tools available today, such as archiving and text indexing, instead allowing for deeper insight into data often using machine learning techniques.
Why is voice data so important?
Historically, voice data has been siloed in financial services: It is expensive to store, very difficult to index, and almost impossible to successfully analyse. “Monitoring” has mainly been a case of dip-sampling audio files to listen for wrongdoing, or using phonetic search techniques to try to identify key words or phrases.
But the AI tools that are being built to monitor the wealth of IM and email traffic that passes through organisations rely on text, so voice traffic gets left behind.
But as the world moves more and more to video interactions, many ad-hoc communications will now be recorded and stored as an audio file, so the compliance tools designed to track this email and IM traffic cannot keep up.
How modern speech tech can help the financial industry avoid compliance related fines
Modern speech technology has come a long way in the AI age and can have great benefits on compliance and monitoring workflows, speeding up detection and review time across an exploding base of data.
However, most people think the “obvious” solution is just to plug in speech-to-text technology to feed the results into the existing text-based tools for analysis.
There are a number of issues with this approach. Modern speech to text technology has come on greatly in the last few years and can do a good job with a wide variety of different types of data, including Zoom and Teams meetings. But it is by no means perfect. And when it comes to telephony, especially trader telephony where jargon and overtalk are frequent, the accuracy drops further.
This means that specialist techniques need to be applied to this data to get the best out of it, particularly where there is unusual vocabulary, high background noise, or heavy accents. Even then, the output will not be a perfect transcript, but an approximation of what has been said.
Even if you did have a perfect transcript, the AI tools used for text analysis are trained on grammatically accurate and properly constructed sentences. And anyone who has sat in a meeting knows that people rarely speak in a completely comprehensible way!
So, treating speech to text as a technology plugin like “OCR for Voice”, is likely to lead to poor outcomes unless the use case is properly understood.
Further issues arise around different languages. Not all of the world speaks English, and not all of them speak English all of the time. Within the modern multi-national organisation, you just cannot predict which language might be used on the next communication, or even whether it will switch partway through
Performance, security and price are also key questions to look at.
What should you consider when looking for audio communication surveillance solutions?
- Think holistic: Look for providers who have integrated their text communications workflows with their audio communications so that it is seamless
- Quick Train: Can the system be quickly updated with new and unusual terminology to help properly capture the context of conversations (eg names, instrument types and trader jargon)
- Language support: Does your solution support the languages you need, and can it detect which language is being spoken and when?
- Review capability: How Smart is the review. Are you just given a transcript, or does the system direct you to parts of the conversation that look interesting?
- On-prem or cloud: Can your solution support your security requirements? If you are going Cloud, does your provider control its own voice processing, or does it outsource it to third parties?
- Audio-specific AI: Has the surveillance solution been trained on real voice data, or does it rely on “standard” natural language processing toolsets trained only on text?
- Speed: Especially if you are working on-premise, can the solution cope with the sheer volume of audio and video data coming through?
- Connection: Can your solution interface with standard capture solutions?
- Costs: Are you being charged on a per monitored person basis, or for every hour that is processed (which becomes expensive and unpredictable)
- Future-proofing; What is the roadmap? New languages, emotion (not just sentiment) detection, search by voiceprint, self-teaching speech recognition, confidential search. Audio surveillance is a lot more than just turning audio into text.
Compliance monitoring requirements are not likely to be relaxed any time soon. The impact of the global economic crisis is still being felt. And with covid-19 necessitating further measures – how can compliance be accurately monitored when teams are working from home? – compliance tools and procedures need to become more sophisticated to protect customers, to ensure the safety and security of the financial system, and individual financial institutions. The relevant technology is evolving at a rapid pace, with few areas moving as swiftly as voice tech.