(by Eoin Watts – Aigent’s Data Acquisition Officer)
At Aigent, we are using artificial intelligence to provide real-time support to call center agents. This in-call assistance decreases overall call-handling time, reduces agent training, improves customer satisfaction and ultimately increases company profits.
In order to create features which make the above possible we needed to train our own Automatic Speech Recognition system. As anyone in the field will testify these features and products require a lot of data. This data comes in the form of transcriptions (in our case the literal verbatim transcription of speech into text) and annotations (i.e. the labelling of both audio and text).
This blog post presents the reader with a comprehensive overview of Aigent’s evolving data acquisition process. Detailing how we got to where we are today and offering a glimpse of what future steps we expect to introduce.
Conveniently, I find myself writing this blog post on my way home from Bacolod, Philippines after visiting Aigent’s main data acquisition source. This is the third time I have visited the aptly nicknamed “City of Smiles”, located on Negros Island, in the Western Pacific. Since the first time I visited Bacolod in 2018 a lot has changed. In terms of numbers, the program has witnessed exponential growth. From the eleven employees (ten Transcribers and one Team Lead) that we had in November 2018 we now have forty-eight Annotators, three Team Leads and one Operations Manager.
To accommodate this growth, Aigent has relocated to a bigger office in a new building (with plenty of room to accommodate further growth). In this large space, the team’s identity can really be observed. Aigent posters line the wall and Annotators who have recently been rewarded for their performance, show off their new Aigent t-shirts and coffee mugs.
However, before we established our own Data Acquisition Team, we pursued a number of other data sourcing methods.
Method 1: Online Datasets
We utilized online datasets to satisfy our search for data, such as the extensive Switchboard Corpus. These datasets, due to their high word accuracy, have certainly been pivotal in the evaluation of our speech models.
But the lack of domain bias in these datasets limits their use when it comes to training a model for our specific business context.
Method 2: Limited Internal Drive
When it comes to raw data, we found ourselves in the unique and advantageous position of having access to an unlimited pool of domain specific audio files. However, we spent the best part of a year attempting to find a reliable, effective and cost-efficient way of converting this raw audio into usable transcriptions and annotations.
Our initial efforts to solve this problem led us to employ a small number of part-time students. Aigent’s prime location in Amsterdam, makes it relatively straightforward to attract native English-speaking students looking for part-time work to support their studies and gain work experience. These students certainly helped us create and define our own data acquisition process. The fact that these students worked in the office allowed developers to quickly test out new transcription tools and get instant feedback. Nevertheless, the nature of part-time work meant this couldn’t provide us the quantity of data needed to train our models. Also in terms of return, it is a relatively expensive data acquisition method.
Method 3: Outsourcing
The third method we explored involved outsourcing this transcription task to a third-party. This multinational company was contracted to provide us 100 hours of high-quality transcription, using our client’s audio. Undoubtedly, these transcriptions will prove to be very useful in the development of our domain biased speech model.
But to yield 100 hours of transcription a lot of work has been logged on our own end. For instance, the pre-processing of the initial datasets, to align with their particular format, took weeks of work.
On the return of the transcriptions we also found that the quality did not meet the expected standard and guidelines were only loosely adhered to. Thus, these transcriptions required an internal quality check from our side. Due to this quality lapse, some data was even returned to the third-party to be transcribed again.
Essentially, the outsourcing of these transcriptions involved a significant amount of insourcing. Yet it was this insourcing that motivated us to seek alternative solutions. The work logged here, laid the groundwork and developed the internal infrastructure that allowed Aigent to realistically pursue the following method of establishing our own, permanent, Data Acquisition Team.
Method 4: Large-scale Internal
The above experiences, as well as our insider access to Ubiquity’s (Aigent’s parent company and a large US call center) resources, led us to make the leap of establishing the Data Acquisition Team in Bacolod in November of 2018. Thus far, having our own full-time team has proved by some distance the most successful way of satisfying our data deficiency. Having our own large team means we can quickly gather domain specific transcriptions and annotations, without sacrificing direct control over the process.
In the first twelve months, roughly 3000 hours of audio were transcribed by the team in Bacolod. To put this in perspective, these figures not only dwarf what our part-time students produced, but also the third-party company. In half the time it took us to receive 100 hours of transcriptions from the third-party, our Data Acquisition Team in Bacolod have produced over ten times as many transcriptions.
This is not to say that having our own team has magically solved all of our data sourcing problems. The balance between transcription quality and transcription quantity, is one which still defines our process. However, the direct control we have over a large team allows us to quickly implement changes, adapt our processes accordingly and witness instant results. As mentioned above, this flexibility is absent when it’s outsourced to a third-party.
It would be disingenuous not to mention the logistical obstacles in running an annotation center on the other side of the world. Unlike many careers in the Philippines, Aigent offers its employees a day-time shift. This six to seven-hour time difference, depending on the time of the year, means there is just a two to three-hour window where the Amsterdam team and the Bacolod team are both working. Therefore, technical hiccups can result in a few hours of lost work.
Regular formal and informal communication channels help to address many of the above challenges. In fact, these logistical problems have been noticeably less damaging than our initial estimates suggested.
There are also a number of benefits associated with the time difference. The time delay allows us to differentiate our processes. Meaning our team in Bacolod can work independently of our Amsterdam office and vice versa. This space facilitates positive change as the Amsterdam office can analyse the work of the annotators and have clear concrete feedback ready for when the Bacolod team begin working again the following day.
A good example of this time difference working to our benefit is in regards to a pilot we are currently running with one of our US based clients. This client takes calls during the US daytime, and by the time they finish receiving calls our Bacolod team arrive to their office to begin their daytime shift. A sub-section of the team then annotate all the calls from the previous day, usually 1,200 recordings, which have been automatically placed in a purpose-built tool using a script developed by one of the developers here in Amsterdam. By the time the Amsterdam team arrive to the office, they can observe these annotations and, if needed, have the necessary time to make adjustments to the model or process. These changes are implemented before the client starts taking calls. This process is repeated daily and allows for clear and accurate analysis which results in positive change.
Similarly, the threat of outages has led to improvements in internal troubleshooting steps, the creation of monitoring logs and driven general tool enhancements. Without the threat of outages and lost working time, it is arguable that these permanent solutions would have been sidelined in the place of more ad-hoc workarounds.
Finally, these sociable working hours help to maintain a positive working environment in Bacolod and ensures that Aigent remains an attractive employment option. These unseen benefits are by their nature difficult to measure, but their importance shouldn’t be underestimated.
The first fourteen months of the program have been a success. The target of establishing a sizeable skilled and motivated workforce has been completed. As a result of this, Aigent now has a stable, efficient and scalable way of converting raw audio into usable transcriptions and annotations.
Looking ahead, 2020 will be a big year for Aigent and the Data Acquisition Team. The reality of onboarding and integrating multiple new clients will test the speed that raw data can be transformed into usable data.
Yet, we’re confident that Aigent’s infrastructure and personnel can handle this pressure. Bring on 2020.