Media Content 2025

Multimodal Media Generation
Workflow

Turning manual news analysis and podcast production into a fully autonomous agentic AI pipeline — from web discovery to published audio and video.

The challenge

High-effort, human-driven content production

Producing regular news-based podcast content requires sustained human effort across multiple stages — monitoring sources, selecting topics, researching and synthesising content, writing scripts, recording or generating audio, editing video, and publishing. Each stage requires different skills and tools, is time-consuming when done manually, and bottlenecks on human availability. The goal was to automate the entire pipeline end-to-end: from autonomous content discovery through to published multimodal media artifacts — all driven by agentic AI, built using AI.

Before — Manual content production
manually search latest updates from svt.com, skysports.com user filter content from findings, refine user generate podcast script user voice by human user manually create video user All steps performed manually · Each stage bottlenecks on human availability
After — Agentic media generation pipeline
MEDIA CONTENT — AGENTIC AI WORKFLOW AGENT 01 Web Search Agent Reads RSS feeds · Fetches latest web updates Internet RSS feeds · Web content fetch latest updates via rss feeds <rss feed content> AGENT 02 Topic Filtering Agent Filters updates based on topic relevance <web urls> AGENT 03 Web Scrape Agent Scrapes URLs · Generates markdown content <web search content> AGENTS 04 — MULTI-AGENT Podcast Script Generation Agents GPT-4o · Gemini · Claude parallel gen <podcast scripts> generated by multiple agents AGENT 05 Reasoning Agent Reconciles scripts · Reasoning synthesis <reasoning content> AGENTS 06 — MULTI-AGENT Audio/Video Script Gen Agents Generates audio & video production scripts Veo3 / TTS Agent Video generation · Audio synthesis <scripts> <scripts> AGENT 07 PR Agent Creates PR · Publishes generated content SharePoint Content publishing destination publish generated content to sharepoint OpenAI GPT-4o · Gemini · Claude · Veo3 · TTS · RSS MCP · OpenAI Agents SDK Human role: supervise and steer — not execute
Outcomes

What the transition delivered

Zero human effort per episode
End-to-end pipeline runs autonomously — from web discovery to published audio and video on GitHub.
Consensus-driven accuracy
Multi-model script generation with reasoning consolidation eliminates speculation and improves content quality.
Multimodal output at scale
Produces MP3 audio, MP4 video, and Veo-3 prompts in a single pipeline run — formats that would require multiple manual tools.
Tech stack
OpenAI Agents SDK Claude Code GPT-4o Gemini Anthropic Claude GPT-OSS (reasoning) Veo-3 TTS GitHub MCP RSS feeds
Research paper
A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows →