I was delighted to be approached by Take Note for a guest blog spot. The post below examines the pros and cons of speech to text tools, and why you may want to use a person for transcription.
As of late, speech to text software, also known as automatic voice recognition (ASR), has expanded its popularity as the intelligence behind it has progressed. ASR is an inexpensive, quick alternative to human transcription. But, it doesn’t offer many of the benefits of traditional human-based transcription services. There are a few key differences between speech to text software and human transcription services, and it’s very helpful to be aware of the different services available and their pros and cons before you make a decision.
Speech to Text Software
ASR software is reasonably priced and in some cases it’s available for free. As a paid service, the two most common types are per minute rates and subscriptions. Realistically, you are probably going to wind up paying around £0.10 per minute of audio or video whether you pay-as-you-go or subscribe.
The paid services tend to output a marginally higher quality transcript as well as additional benefits such as encrypted storage. When it comes to speed, ASR is exceptionally speedy, turning around hours of audio in a few minutes. Some software spits the words out live as you speak or play into a microphone (e.g. Google), whereas others process the whole audio and output it in one go (Speechmatics). The latter tends to work about three times as fast, i.e. 60 minutes of audio takes about 20 mins to transcribe with ASR.
Quality of the transcript is usually where speech to text software trips up. There is no quality control or proofreading set up with ASR, so the output can be inconsistent. The quality of the transcript really depends on the quality and clarity of the audio, as it’s still common for ASR to output transcripts that are incomprehensible.
There are a few factors that can influence the quality of the transcript— background noise, accents, and multiple speakers, particularly if they talk over one another. Indeed, even with a high quality, ‘clean’ audio recording, speech to text software often struggles to hit 80% accuracy. If using ASR, you should expect to spend a considerable amount of time cleaning up the transcript; often this is not much less time than it takes to transcribe it in full.
Human Transcription Services
Human transcription services are more expensive than speech to text software. While prices may begin from as little as £0.40 per minute, any services less than £0.75 almost always come with conditions, such as turnaround times in excess of seven days or even pushing the bounds of what is ethical in outsourcing to offshore workers.
Prices that start off low can often be misleading – there will likely be additional charges for the number of additional speakers, poor audio or strong accents. By and large, you can expect to pay around £2 per minute. However, you may be able to get a lower rate with a bulk discount or longer turnaround times.
Even though it’s likely to cost more, with human transcription you get the quality control and options that a human transcriptionist can offer. Human transcription services make 99% accuracy guarantees. Customers looking for complete reliability benefit from proofreading services that often come standard.
You can also choose from a range of transcript options. You can order a verbatim transcript that includes every little detail of a recording, even including notes about tone, laughter and pauses. The industry standard, however, is ‘intelligent verbatim’. This is an edited text with any ‘erms’, ‘ums’ or repetitions removed. These types of transcripts are much easier to read and understand. They are also easier to transcribe, often bringing a price break compared to the verbatim option.
The cheapest transcripts are notes/summary transcripts. These often just paraphrase the meaning, providing you with a quick way to digest information if the details aren’t that important. Conversely, you might use this type of transcript to figure out which recordings you should actually listen to yourself.
Which Transcription Service is Right for You?
Automatic services might seem like the safer bet, but it is easy to be lured into a false sense of security. When using automatic services, you may not ever really know what happens to your data, and you don’t know whether or not a human will actually come into contact with it.
These services might use your recording for test or control purposes, meaning a human might listen to your file anyway. If no security promises are made, you simply don’t know. Regardless, encryption is just as big a concern with automatic services as it is with human-based ones. You also run into all of the quality control issues detailed above — ASR software simply is often not up to the task of producing a usable transcript under many circumstances.
The type of transcript you want may also affect your decision. ASR produces only verbatim accounts (if they succeed at all), which may not be suitable for you. With human services, you can choose from a broader spectrum of products, including summaries and intelligent verbatim.
To make the decision process easier, it’s a good idea to ask yourself the following questions:
- How much do I have to spend?
- How quickly do I need the transcript?
- Is accuracy essential?
- How important are security and data protection?
- What kind of recording needs transcribing?
- How much time do I want to spend editing and cleaning up the final version?
- Am I happy with a verbatim transcript, or would it be better to have a summary or an intelligent verbatim account?
- Is the recording good quality?
Once you’ve answered these questions, you’ll have a much better idea of which option suits you best.
You can read my blog about the benefits of audio-typing here!