Media Content 2025

Multimodal Media Generation
Workflow

Turning manual news analysis and podcast production into a fully autonomous agentic AI pipeline — from web discovery to published audio and video.

The challenge

High-effort, human-driven content production

Producing regular news-based podcast content requires sustained human effort across multiple stages — monitoring sources, selecting topics, researching and synthesising content, writing scripts, recording or generating audio, editing video, and publishing. Each stage requires different skills and tools, is time-consuming when done manually, and bottlenecks on human availability. The goal was to automate the entire pipeline end-to-end: from autonomous content discovery through to published multimodal media artifacts — all driven by agentic AI, built using AI.

Before — Manual content production

After — Agentic media generation pipeline

Outcomes

What the transition delivered

⚡

Zero human effort per episode

End-to-end pipeline runs autonomously — from web discovery to published audio and video on GitHub.

◎

Consensus-driven accuracy

Multi-model script generation with reasoning consolidation eliminates speculation and improves content quality.

↑

Multimodal output at scale

Produces MP3 audio, MP4 video, and Veo-3 prompts in a single pipeline run — formats that would require multiple manual tools.

Tech stack

OpenAI Agents SDK Claude Code GPT-4o Gemini Anthropic Claude GPT-OSS (reasoning) Veo-3 TTS GitHub MCP RSS feeds

Research paper

A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows →

Multimodal Media GenerationWorkflow

High-effort, human-driven content production

What the transition delivered

Multimodal Media Generation
Workflow