A quick Google search on how to transcribe YouTube video will either show some paid audio transcription services like Fiverr/ Rev or other blogs suggesting audio transcribing tools where you have to type the whole thing manually. But thankfully, there is a better way to do it.
Thanks to the machine learning, now computers (in this case Google’s Voice to text feature) can auto-generate subtitles from any video or audio. By default, it listens to your voice from the device’s microphone. And with some fine adjustment, we use this to convert any video/audio to text. ( Or watch the video tutorial at the end of this article).
This workaround is free, works on both Windows and Mac (don’t support mobile devices yet). And the best part is, it also support many foreign languages as well. Although to be honest, it’s still not 100% accurate. But if the audio is clearly, you can easily 80-90% accuracy.
Sounds interesting? So, let’s see how to do it.
Why Transcribe YouTube video
1. SEO Benefits: Unlike a blog post, YouTube can not read your videos. Yes, there are things like title and tags etc, that tells YouTube what your video is all about. But, adding subtitles to all your videos will tell them more about the content. They might even boost your videos on search results
2. Accent: People come from all part of the world on YouTube and accent can become a big problem. For instance – the English (US) accent is quite different from English spoken in India. So, having caption comes in handy
3. Transcribe other videos: If you have a foreign movie clip whose subtitles are not on the internet
4.Transcribe videos for money: If you someone, who makes money by transcribing videos on Fiverr or Rev, then this workaround will help you automate 80% of your work
5. Repurpose the video on your blog: If you have uploaded videos with unique content and would like republish it on your blog post. Or you found some video lecture online and want to transcribe it for academic purpose.
If you fall under any such scenario, then this method will help.
Download transcription if a YouTube video already has it
Before you do the hard work of creating subtitles for YouTube videos, it’s better to check if they already have the subtitles or not. To check, look for cc button next to it, or go to settings and look for subtitles there.
Usually, all the video uploaded to YouTube after 2014, has an automatic English subtitle by default, which is pretty decent if you are a native speaker. And many professional YouTuber also adds captions. If you can see the caption, it’s pretty simple to download it.
In the video description, click on More > Caption > Select language > you’ll see the subtitles, just copy paste it. However, for some reason, if you want to download the .srt file with timestamps, or want to do it with bulk videos, then use Ccsubs or Down subs. There is also a chrome extension on GitHub to do the same thing.
Transcribe Video/Audio to Text with Google Docs
There are many video to text converter online or offline tools, but I found Google’s Voice to text feature the best. A few years back, it was not quite efficient, but thanks to the AI, this feature has evolved a lot.
Google voice to text, will convert your audio to text in real time. But if you try that by playing video on one device and recording it from another using Google Voice to text; then, unfortunately, you won’t get much accuracy as most of your words will be lost in the noise.
So, the trick here is to make your computer record the system audio instead of the microphone. And then play the audio or video you want to transcribe and record it with Google docs voice to text. The computing is done on Google cloud server, so you will also need the active internet for this work.
Now, let’s see how to do it.
1. Transcribe Video/Audio to Text on macOS
Most computers don’t let you record your computer audio maybe to avoid piracy (like people using it to record Spotify songs etc.)
1. Download a third party software called soundflower; this will help us record system audio. Once done unzip it and install it.
2. Next, you need to tell MacOS to use the output audio as input. To do that go to sound settings and set soundflower2ch on both input and output
3. Now, fire up Google chrome (yes, it only works in chrome). Open Google Docs > Right-click and select Create a New Document > Tools > Voice typing
4. In another Chrome window, open YouTube and play any video
5. Now come back to Google Drive, tap on the Google voice icon and select your accent or language from the list and then start recording
And that’s it; now you should write the lyrics on your screen.
2. Transcribe Video/Audio to Text on Windows PC
Now, let’s try this on windows
1. Go to your windows Sound settings > select Recording Device > select Stereo Mix and set it as default. If you don’t see the Stereo Mix option, right click and turn on show disable devices
2. Next, do the same thing, you did for macOS, i.e. open Google Docs > Right-click and select Create a New Document > Tools > Voice typing3. Play a video > and start recording. And it should work.
What if Stereo Mix is not available?
In many new computers, the sound card does not support stereo Mix option, for that, you can check this article on how to record system audio without a stereo mix. Tough, I’ve not tested this method, so, I’m not sure if it’s going to work on not.
If Stereo Mix option is lost after updating your PC to Windows 10, then you can install Realtek audio driver and turn enable it from Windows device manager. Restart your system and you’ll see the Stereo Mix option back in the sound settings.
How do I upload a transcript to YouTube?
Now, that you have the subtitle with you in a text file, you are ready to upload that on YouTube. Here is how to do it.
1. Go to your YouTube dashboard, click on the Edit button next to your videos > Subtitles/cc > Add new Subtitles > Select a language > Transcribed and auto sync and then paste the text there. It takes 10-15 mins to sync it. Remember to come after 10-15 mins and publish it. Also, disable the automatic one.
I used this method to transcribe a couple of old videos, the accuracy was more than 80% all the time.