Career Guides

How to Transcribe an Interview Fast (And Get It Right)

Transcription eats hours if you let it. Here's how to pick the right method, handle bad audio, and produce a transcript people can actually use.

Written By

Cara Mu

Reviewed by

Michelle Xu

How to Transcribe an Interview Fast (And Get It Right)

Published on

May 15, 2026

•

Updated on

May 15, 2026

•

min read

Figuring out how to transcribe an interview the right way, before you start, saves hours. Most people don't realise how time-consuming transcription is until they're already an hour into a five-hour job, staring at a recording that sounds like it was made inside a washing machine.

Whether you're a journalist working through source tape, a researcher coding qualitative data, or a candidate reviewing your own interview performance to sharpen for the next round, the decisions that follow are the same. This guide covers the ones that actually matter: manual versus automated, which style of transcript to produce, how to handle unclear audio, and how to format the output so it's useful to whoever receives it.

QUICK ANSWER

What's the best way to transcribe an interview?

Record in a quiet space, choose your method (manual, AI, or professional service), then clean up the output before sharing. Manual transcription takes roughly 4-6 hours per hour of audio. AI tools like Cook'd.ai cut that to minutes. Either way, you'll need to proofread. No method is perfect straight out of the box.

What interview transcription actually involves

Transcription is the process of converting spoken audio into written text. For an interview, that means capturing what the interviewer asked, what the respondent said, and enough context for someone reading it to follow the conversation without hearing the recording.

The output can range from a rough word-for-word dump to a clean, publication-ready document. Which one you need depends on what the transcript is for, and that decision shapes everything else.

Here's what most guides don't tell you upfront: even with the best AI tools available today, transcription is never fully automated. Every method produces a first draft that needs editing. The question is how much editing.

Manual, AI, or professional: choosing the right method

The best way to transcribe an interview depends on three things: how much time you have, how accurate it needs to be, and whether the content is sensitive.

Manual transcription

You listen to the recording and type everything yourself. It gives you complete control and catches things automated tools miss: a meaningful pause, a tone shift, a word the speaker clearly stressed. The catch is time. A one-hour interview takes the average typist four to six hours to transcribe accurately. Double that if the audio quality is poor or multiple speakers are talking fast.

Manual is the right call for legal proceedings, confidential HR interviews, or any situation where being wrong matters. It's not the right call when you've got six interviews to process before Thursday.

AI transcription tools

Upload a file, wait a few minutes, get a transcript. Modern AI tools hit 95% accuracy or better on clean audio, which sounds good until you realise 5% errors in a 10,000-word transcript is 500 mistakes to find and fix. Proofreading against the recording is not optional.

That said, the time savings are real. A 45-minute interview that would take four hours manually takes under 10 minutes with AI. The gap between raw output and usable transcript is usually 20-40 minutes of cleanup, not hours.

Cook'd.ai is built specifically for this: upload your interview recording, get accurate text back fast, and spend your time on the content rather than the mechanics. It handles speaker identification, produces clean output, and doesn't require a monthly subscription to access the core functionality.

Professional transcription services

Human transcriptionists through agencies like Rev or Scribie deliver near-perfect transcripts, usually within a few hours. They're the right choice when accuracy is non-negotiable, in broadcast media, legal depositions, or academic research, and the $1-$3 per minute cost is justifiable. For anything else, AI tools have largely made this option redundant for everyday use.

Method	Time	Cost	Accuracy	Best for
Manual	4-6x audio length	Free	High if careful	Legal, confidential
AI tool	Minutes	Free-$30/mo	95%+ (needs proofread)	Most everyday use cases
Professional service	Hours-1 day	$1-$3/min	99%+	Broadcast, legal, research

How to transcribe an interview: step by step

Whether you're doing it manually or using a tool, the process is the same. Skipping steps is how you end up with a transcript that's technically complete but practically useless.

1	Record in the right conditions. Quiet room, quality mic, speakers who aren't talking over each other. Garbage in, garbage out. No tool fixes bad audio.
2	Choose your transcription method. Manual if accuracy is critical and budget is zero. AI tool if you want it done in minutes. Professional service if it's going in a courtroom or publication.
3	Pick your transcription style. Verbatim (every 'um' and pause) for legal or research. Intelligent verbatim (cleaned up but not rewritten) for most situations. Edited for published content.
4	Upload or start typing. If using an AI tool, upload the file and wait. If manual, use transcription software (oTranscribe, Express Scribe) with foot-pedal shortcuts to avoid constantly switching windows.
5	Label speakers clearly. Use 'Interviewer:' and 'Respondent:' or real names. Add timestamps every few minutes. This makes the transcript usable, not just readable.
6	Proofread against the recording. Always play the audio while reading the transcript. You'll catch misheard words, dropped sentences, and wrong speaker attribution that look fine on screen.
7	Format for how it will be used. Research: timestamps every paragraph. Published interview: cut the filler, tighten the answers. Legal: strict verbatim with line numbers. Format matters.

Verbatim, intelligent verbatim, or edited: which style do you need?

This is the decision most people make wrong. They assume transcription means typing every word — it doesn’t always. There are three distinct styles, and the wrong choice creates extra work.

Verbatim: Every word, every filler, every false start. 'I was, uh, I was thinking that, you know…' becomes exactly that on the page. Required for legal proceedings and research where the precise phrasing is part of the data.
Intelligent verbatim: Filler words removed, false starts cleaned up, but nothing is rewritten or reordered. This is the default for most professional uses — journalism, HR, qualitative research, business interviews. Readable without being editorialised.
Edited: The transcript is polished for publication. Run-on answers are tightened, questions may be paraphrased. Used when the interview is going into a magazine, blog post, or marketing asset. Only appropriate when the interviewee has approved it or when you’re the author of both sides.

When in doubt, use intelligent verbatim. It’s accurate enough for almost any purpose and far more readable than a fully verbatim dump.

Handling unclear audio: what to do when you can't make it out

Bad audio is where most transcription projects fall apart. Understanding how to transcribe interviews that have audio problems, rather than just guessing or leaving gaps, is the skill that separates a useful transcript from a liability.

Problem	What to do
Overlapping speech	Slow playback to 60% speed. Note '[inaudible]' for anything still unclear. Don't guess.
Heavy accent or regional dialect	Use a transcriber or service that supports your specific variant. Generic AI models struggle here.
Background noise (traffic, AC)	Try noise-reduction software (Adobe Podcast Enhance, Auphonic) on the audio file before transcribing.
Low volume or muffled audio	Boost with Audacity or a free tool like dBpoweramp before uploading. Most AI tools can't compensate for gain issues.
Technical jargon / brand names	Build a custom glossary before you start. Most AI tools let you add word hints or a vocabulary list.

The golden rule: never guess. If a word is unclear, mark it as '[inaudible]' with a timestamp. Readers can return to the recording. A wrong word in a legal document or a misquoted source in an article can have real consequences.

Stop typing what you already heard.

Cook'd AI transcribes your interviews so you can focus on the editorial work that actually matters.

Try Cook'd AI free →

Formatting your transcript so it's actually readable

A wall of dialogue with no structure is technically a transcript. It's also nearly useless. Formatting is what makes the difference between something you can skim and something you have to read start to finish to find anything.

Label speakers consistently on every turn. 'Interviewer:' and 'Sarah Chen:' every time, not alternating between 'Q:', 'Interviewer:', and 'Her:'.
Add timestamps at regular intervals — every 2–3 minutes for long interviews, every paragraph for shorter ones. Timestamps let you jump back to the exact moment in the recording.
Use paragraph breaks to separate distinct topics or answers, not just speaker turns. Long monologue answers are unreadable as one block.
Include a header: interview date, subject name (or pseudonym), interviewer name, total duration. This turns a raw transcript into a retrievable document.
For research or legal use, add line numbers. Makes referencing specific passages straightforward in reports or proceedings.

The best way to transcribe an interview isn't just about accuracy. It's about producing something the next person can actually use without having to reorganise it first.

How Cook'd AI takes the grind out of transcription

Transcription is one of those tasks where the work isn't hard, it's just relentless. You're not solving anything. You're typing what you already heard, rewinding the same fifteen seconds, fixing the same speaker label you keep flipping. By hour three, you're not transcribing an interview, you're hostage to one.

Cook'd AI does the typing so you can do the thinking. Upload the recording, get back a structured transcript with speakers labeled and timestamps in place, then spend your time on the parts that actually need a human: catching the misheard word, deciding what to cut, shaping the quotes that matter. The mechanical work disappears. The editorial work, which is the only part worth your hours, stays with you.

Your next interview doesn't have to eat your afternoon. Try Cook'd AI.

Stop typing what you already heard.

Cook'd AI transcribes your interviews so you can focus on the editorial work that actually matters.

ON THIS PAGE

Stop typing what you already heard.

Cook'd AI transcribes your interviews so you can focus on the editorial work that actually matters.

Try Cook'd AI free

Try Cook’d Now

Frequently Asked Questions

Frequently asked questions

How long does it take to transcribe a one-hour interview?

Manual transcription takes four to six hours per hour of audio for an average typist. AI tools produce a first draft in five to ten minutes, but proofreading adds another 20-40 minutes. A clean, usable transcript from a one-hour interview typically requires one to two hours of total effort when using AI.

Do I need permission to transcribe an interview?

If you recorded the interview, you generally have the right to transcribe it, but rules around consent for recording vary by location. In two-party consent states or countries, you need everyone present to agree to being recorded before you start. Always disclose you're recording. When the transcript will be published or used in legal proceedings, get written confirmation from the interviewee.

What is the most accurate free way to transcribe interview recordings?

OpenAI's Whisper model is the most accurate free option and runs locally, which matters if the content is sensitive. Otter.ai's free tier (300 minutes per month) is the most accessible browser-based option. Microsoft Word's built-in Transcribe feature is also free with a 365 subscription. All three require proofreading.

How do you handle two speakers talking at the same time?

Slow the playback to 60-70% speed and listen through headphones. If you still can't separate the words, mark the section as '[crosstalk at 00:14:22]' and move on. Guessing what was said is worse than acknowledging the gap. If the overlap is critical, consider having the audio professionally cleaned before transcribing.

How to transcribe interviews for research: is there a different process?

Research transcription almost always requires verbatim style with timestamps and speaker labels. Many research ethics boards specify the format. You'll also need to consider anonymisation, replacing names, locations, and identifying details with codes, before the transcript is stored or shared. Check your institution's data management guidelines before you start.

Que

Answer