As you’ve probably seen, I’ve been dabbling in AI video. Well, I did my longest one yet, and it was for The Babylon Bee.
This is over two minutes long, has five characters, and is dialog-driven; it really felt like I was pushing where AI video is to its limits (and maybe a little beyond).
Getting the initial image to animate was pretty easy, as I just gave ChatGPT the five mascots and told it to put them in a support group. I also used the new Gemini Nano Banana to get an image of the Cracker Barrel Guy’s chair being empty for a later scene.
The hard part was getting some wide shots with dialogue, as the AI did get very confused with that many distinct characters in frame. Once I got a few wide shots and a few good close-ups to work with (simply prompted Veo 3 for those and then reused them), doing all the dialogue was decently straightforward.
And then Veo 3 did not keep all the voices consistent (maybe because it’s animation style and not live action), so I had to use ElevenLabs to get the voice of Uncle Ben and the Cracker Barrel Guy consistent.
And not all the line deliveries were exactly what I wanted (I wonder if, for something like this, I should maybe do performance-driven AI, but I don’t have a good way to do that with that many characters on screen). Veo 3 can do some good performances, but you really have to prompt in a way it understands what you’re going for, which can be a challenge when doing some out there stuff.
Also, on top of all this, they uncanceled the Cracker Barrel Guy while I was making it, so I had to adjust the ending.
Still, I think it’s pretty impressive for something that can be made by one person in a few hours. And only a couple of months ago, I don’t think this would have been possible.
Pretty awesome!
That was unexpected.