Back to references

Multilingual Video Source Normalization Pipeline

Design of a hybrid system (web tool + automations) to clean, match, and normalize multilingual video sources at scale, for industrial exploitation.

🧼

Client

Boxoffice

Period

Dec 2024 → Sep 2025

Media Pipeline Multilingual Automation ffmpeg Airtable

Related domains

Production

Post-Production

Distribution

Data Analysis

Context

Boxoffice (a media group subsidiary) handles the distribution of complete films as well as their adaptation into YouTube clips, from catalogs provided by major producers.

The received sources are multilingual (up to 8 languages) and extremely heterogeneous: multiple video files, separate audio tracks, distinct subtitles, variable formats and encodings depending on suppliers.

Historically, source preparation relied on entirely manual work, performed by editors in Premiere, making industrialization difficult.


Challenges

  • Repetitive and time-consuming work for teams
  • High risk of human errors (languages, synchronization, alignment)
  • Heterogeneity of formats, durations, encodings, and tracks
  • Difficulty absorbing hundreds of films and dozens of new releases each month
  • Need to produce clean and consistent sources for downstream multilingual use

Intervention

Intervention as part-time CTO / Head of Systems, with a role focused on structuring, design, and project oversight for Cleaner (also called Baymax).

My role covered:

  • Functional and technical need framing
  • Designing a hybrid pipeline combining human and automation
  • Defining multilingual normalization rules
  • Coordination between technical and operational teams
  • Monitoring production deployment and adoption

The goal was to drastically reduce human workload while maintaining high reliability.


System Implemented

  • Web tool used by teams to explicitly match:
    • video files
    • audio tracks
    • corresponding languages
    • subtitles
  • Automations to:
    • detect audio tracks and their languages
    • verify timecode consistency
    • normalize audio and subtitle formats
  • Video processing pipeline based on Python + ffmpeg
  • Management and tracking interface built with Airtable
  • Generation of standardized outputs, usable by downstream systems

Results

  • Near-automatic cleaning and preparation of multilingual sources
  • Massive reduction in time spent per film
  • Strong decrease in language and synchronization errors
  • Ability to absorb industrial content volumes
  • Reliable and normalized outputs

The produced files are now used directly by EdiThor, without manual reprocessing.


What This Project Illustrates

  • Design of industrial media pipelines
  • A pragmatic approach combining automation and human control
  • Handling complex multilingual challenges
  • The role of a part-time CTO / Head of Systems on critical value chains

Next Steps

Cleaner is a foundational building block of the multilingual production chain. The system is designed to evolve with new formats, new languages, and new use cases, without questioning the existing architecture.

A similar project?

Let's discuss your needs and see how I can help.

Get in touch