Multimodal Media Generation
Workflow
Turning manual news analysis and podcast production into a fully autonomous agentic AI pipeline — from web discovery to published audio and video.
The challenge
High-effort, human-driven content production
Producing regular news-based podcast content requires sustained human effort across multiple stages — monitoring sources, selecting topics, researching and synthesising content, writing scripts, recording or generating audio, editing video, and publishing. Each stage requires different skills and tools, is time-consuming when done manually, and bottlenecks on human availability. The goal was to automate the entire pipeline end-to-end: from autonomous content discovery through to published multimodal media artifacts — all driven by agentic AI, built using AI.
Before — Manual content production
After — Agentic media generation pipeline
Outcomes
What the transition delivered
Zero human effort per episode
End-to-end pipeline runs autonomously — from web discovery to published audio and video on GitHub.
Consensus-driven accuracy
Multi-model script generation with reasoning consolidation eliminates speculation and improves content quality.
Multimodal output at scale
Produces MP3 audio, MP4 video, and Veo-3 prompts in a single pipeline run — formats that would require multiple manual tools.
Tech stack
OpenAI Agents SDK
Claude Code
GPT-4o
Gemini
Anthropic Claude
GPT-OSS (reasoning)
Veo-3
TTS
GitHub MCP
RSS feeds
Research paper
A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows →