Linda Cabral and Laura Sefton on Using Voice Recognition Software for Transcription

6 Comments / Health Evaluation, Qualitative Methods / By Sheila Robinson / March 27, 2013 March 27, 2013

Hello, we are Linda Cabral and Laura Sefton from the Center for Health Policy and Research at UMass Medical School. We often collect qualitative data from interviews and focus groups. One challenge we frequently face is how to quickly and efficiently transcribe audio data. We have experimented using voice recognition software (VRS), and we’d like to share our approach.

You will need headphones, a microphone (stand-alone or attached to a headset), and a computer with audio playback and VRS installed on it. We use Dragon Naturally Speaking Premium Version 11.5 voice recognition software, however other VRS is available. Use of audio playback software will allow you to control the playback speed, so you can slow it down, pause, fast forward, and rewind as needed.

Open the audio file in the playback software and open a new document in the VRS. While listening to the audio via the headphones, repeat what you hear into the microphone. During this step, you can format the document to indicate who is speaking and to add punctuation. Because VRS works best when trained to understand a single voice, a designated team member should repeat all spoken content, regardless of how many voices are in the audio file.

This process will generate a document in the VRS that can be saved to your computer as a Word file. As a final review, read through the Word file while listening to the audio file and make needed corrections. This could be done by another member of the project team as a double check of the document’s accuracy.

Hot Tips:

Spend time training the VRS to recognize your voice. A few practice sessions with the software may be needed where you can read dummy data into the software in order for it to learn your voice. This will improve the transcription quality, minimizing the time spent editing.
Train the VRS to recognize project-specific acronyms or terminology prior to starting transcription.

Lessons Learned:

Often, financial resources for evaluation projects are limited. In an effort to keep the transcription process in-house, our administrative staff transcribed the audio files. By using the VRS and someone from our project team familiar with the data as the designated recorder, we have found savings in time and efficiencies.
No transcription yet has captured 100% content accurately the first time. Therefore, build in time to listen to the recording and to make manual edits.

Rad Resources:

These resources may be helpful as you explore whether VRS is right for you.

“The Voice Transcription Technique: Use of Voice Recognition Software to Transcribe Digital Interview Data in Qualitative Research” by Jennifer Matheson

VRS products Review by consumersearch: “In reviews, it’s generally Dragon vs. Dragon”
(Share Clip)

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

6 thoughts on “Linda Cabral and Laura Sefton on Using Voice Recognition Software for Transcription”

Pingback: Sook Switcher
Medical Transcription
February 16, 2015 at 3:19 am

Dictation results in better structured and more creative writing of letters, essays etc. than either handwriting or typing out your own thoughts for most people. Learn the principles of excellent dictation from somebody who’s dictated documents professionally for nearly fifty years.

Reply
Ashley Price
January 28, 2014 at 3:13 am

Since my comment (above) the producers for Dragon Naturally Speaking have brought out a version called “Dragon Dictate” which will now work directly with audio files.

However, this is still only for dictation; Dragon specifically state that it is for recordings of “the user’s voice”.

Reply
Luis Pelayo
August 27, 2013 at 4:47 pm

This new platform indexes and transcribes your video and audio assets. Engineered with industry leading speech recognition technology, uSubtitle automates the process of timing subtitles to video and audio by generating the text and timing. uSubtitle offers the flexibility of an edit application, which gives users the ability to edit and perfect the returned transcription and timing.
This version of uSubtile follows a software-as-a-service (SaaS) based business model. We currently support the English and Spanish languages, along with the following file formats (.mp4, .m4v & .mp3). Our current output file offering includes Timed Text file. TTML, SRT and WebVTT YouTube compatible file formats.

Reply
1. Luis Pelayo
  August 27, 2013 at 4:57 pm
  
  You can find uSubtitle at http://www.uSubtitle.tv
  
  Reply
Ashley Price
April 3, 2013 at 7:10 am

As a transcription service that has been transcribing qualitative research recordings for researchers at universities in 1998, I find this article interesting.

As was mentioned above, VRS won’t work on an audio file directly. As well as the number of participants confusing the software, any background noise (e.g. the interview was held in a busy coffee shop), will make it impossible.

As stated in the article, you need to train the VRS to your voice. This means you then have to use the same voice every time you do this method. What happens if you’ve a cold or illness that temporarily changes how you pronounce words? (We all know how people sound with a blocked nose.)

You also need to make sure you are going to be in a quiet room and not be continuously disturbed if you are going to try the above method. Again if you have background noise while you’re speaking then the VRS may struggle to pick up your voice over the other noise. (Remember, your brain filters out background noise – a microphone doesn’t.)

Realistically, though I think the method of re-saying what is said on the recording will only work for short, 1-2-1 interviews. Repeating everything that has been said in a 2-hour focus group with eight participants (one of the transcripts we’ve have just finished), will probably drive you mad.

I think, that physically typing the recording is still the best way to go. You don’t have any of the above problems. You can work in a noisy atmosphere (use headphones to hear the recordings), you won’t go insane repeating everything from the recording and it won’t matter if you’ve a blocked nose.

Of course, the one downside with typing, is the time it takes if you’re not a typist. But then why not outsource it (I would say that wouldn’t I)? Then you get the typing done for you and you can concentrate on other stuff (even if it’s just taking a few days off).

Reply

6 thoughts on “Linda Cabral and Laura Sefton on Using Voice Recognition Software for Transcription”

Leave a Comment Cancel Reply