How To Use Gemini AI To Summarize YouTube Videos

The big AI companies are continually promising that their tech will save us time and boost our productivity—albeit with big questions about copyright abuse, illegal content, and skyrocketing energy use hanging rather inconveniently around in the background. But if you’re looking to put more time back into your busy schedule, AI can be a useful tool, and maybe in some ways you haven’t even thought about.

One of those might be summarizing YouTube videos. AI has already shown it can be a fairly reliable summarizer (although not always), and if you just need to extract a few salient points from a series of videos that are 15 or 30 minutes long, the time saved can quickly add up.

Google Gemini has a new AI model, Gemini 2.0 Flash Thinking Experimental, which can plug into Google apps including Google Search, Google Maps, and YouTube. The model is available to all Gemini users, paying or not, and we tested it out on a selection of clips using Gemini’s web interface.

How to Find the Feature

The new model is available to all Gemini users.

Photograph: David Nield

If you open up Gemini on the web, start a new chat, and go to the model picker in the top left corner, you should see one labeled 2.0 Flash Thinking (experimental). This is the one with the Google app connections built in, though most of the time you need to specify which app you want to use (when looking up a place on Google Maps, for instance).

The model isn’t difficult to find in the Gemini apps for Android or iOS either: If you tap the drop-down menu at the top of a new conversation (which should be labeled with the model you’re currently using), you’ll see the 2.0 Flash Thinking (experimental) option available for selection.

You’ll probably find the feature a little easier to use on the web, where you can drag YouTube URLs between browser tabs for analysis, but you can get to it on mobile too. Besides analyzing YouTube videos, you can search for new content: Try asking for YouTube for videos about baseball highlights or science explainers, for example.

Summarize Match Highlights

Gemini didn’t get everything right about Super Bowl LIX.

Photograph: David Nield

To begin with, we put Gemini to work on a highlights package of last year’s Super Bowl LIX highlights—almost 20 minutes of action—to see what the AI would make of it. To begin with we just asked “What’s happening in this game?” and in a few seconds we had details of the teams and who won (which the AI got right), and some key highlights.

A follow-up question about the final score was answered correctly, but Gemini got the name of the scorer of the first touchdown wrong: The AI suggested it was Johan Dotson. Dotson was shown getting a touchdown in the highlights with the scores at 0-0, but it was ruled out—an example of the nuances that AI doesn’t necessarily pick up on.

Gemini did successfully identify when the Kansas City Chiefs got their first points, and even included a timestamp linking straight to the touchdown in the YouTube clip. It also got the name of the scorer right. It seems Gemini is heavily reliant on the commentary for sports clips, which isn’t surprising.

Summarize Video Contents

The AI can pick out video details—if they’re mentioned in the audio.

Photograph: David Nield

Next, we tried putting Gemini up against a behind-the-scenes featurette for The Grand Budapest Hotel, directed by Wes Anderson. The clip runs to four-and-a-half minutes, and Gemini fired back some replies almost instantly: It identified the name of the film being talked about, and the main beats of the clip’s narrative.

However, it’s all reliant on the audio (or the transcript) again—there doesn’t seem to be any analysis of the actual video contents. The AI couldn’t say who the talking heads were in the video, even though their names were shown on screen, and wasn’t able to say who the director was (even though this was also mentioned in the video description).

On the plus side, Gemini did do an impressive job of summing up the audio of the video. It correctly identified some of the filmmaking challenges that were mentioned throughout, and provided timestamps to them — from looking for a set to represent the Grand Budapest, to filling it with extras.

Summarize Interviews

Gemini can provide timestamps for the specified video.

Photograph: David Nield

Finally, we tried Google Gemini with an interview: Channel 4 in the UK speaking to Charlie Brooker and Siena Kelly about the latest series of Black Mirror (perhaps appropriate for an article on AI). Gemini proved itself very capable at picking out the talking points, and adding timestamps, though of course the whole video is mostly talking.

Again though, there’s no context about anything outside of the audio or the transcript. Gemini AI couldn’t say where the interview took place, or how the participants were acting, or anything else about the visuals of the video—which is worth bearing in mind if you use it yourself.

For videos where the answers you want are in the audio of a YouTube video, and its associated transcript, Gemini works really well at summarizing and providing accurate answers (provided the commentators mention when a touchdown is ruled out, as well as when one is scored). For any kind of visual information, you’re still going to have to watch the video yourself.

How to Find the Feature

Summarize Match Highlights

Summarize Video Contents

Summarize Interviews

Related Posts

Dogecoin (DOGE) Weak Rebound Attempts, Signaling Struggle to Mount Recovery

XRP Price Continues Lower as Sellers Tighten Grip on Intraday Structure

Ethereum Slips to $3K, Highlighting Weakness After Recent Failed Rebound

Leave a Reply Cancel Reply