Podcast Interactivity Spec Proposal

February 23, 2015 13:14 EST • Alexandre Vallières-Lagacé • 6 minute read

I’ve had this on my mind for a long while and I think it’s been in my thoughts long enough that I should lay it down in words.

Premise

Podcasts are great. I love them, I listen to a bunch of them, I enjoy them every day. Some have show notes that are great to put in context relevant information discussed, others nothing. Most will have sponsors or a mean of sponsoring the shows (like an Amazon affiliate code). None of them have an easy way to help out while listening. You always need to go online and follow some acrobatic steps to, eventually, chip in a bit of your money towards something you love.

We need a way to enhance the interactivity of podcasts without impacting the listening enjoyability of those who just want to listen, passively. This interactivity should enable an easier path for podcasters to earn money as well as provide a way to include relevant media information that the listener can consult should he/she wants to have more information while listening.

Isn’t it what show notes do today?

Yes and no. Yes you can always flick open the show notes, find the link on the page and consult whatever information there is included. But it’s a manual scavenging task. There should be a better way to do this. Plus it does not necessarily work well as external media sometimes kick you out of the app and you need to get back in manually.

I don’t expect everyone to jump on this and implement these specs. At the moment it’s more of a here’s an idea, let’s think about it. Chapters came and are now gone, proof that it was perhaps too cumbersome to add to podcasts and too few supported it.

Today the podcast world is much different, serious enterprises are built around this medium and mobile apps are much better with much more dynamic developers.

There is hope.

Proposal

We need to have a system in place that automatically show relevant content at certain timestamps, similar to how subtitles appear in a movie. May it be a simple link to the sponsor website when the podcaster goes through the read-out or simply show a funny animated GIF when discussing related news. All of this will of course be optional and you can still continue to listen as you currently do in your car.

Content types

There is a ton of content out there so I’m just going to focus on the most important ones. Revenue generating content (sponsors and links) and photo/video related content.

The goal with this is to increase the sponsor visibility and can be as simple as opening up a ChildBrowser that load the sponsor’s website. Generally, sponsors wants the listener to purchase a product or service, therefore any link should be accompanied by a share button that opens a share sheet and can help you find back this link later or on your computer (email, iMessage, Instapaper, Read Later, Evernote, etc).

Sponsor and affiliate example screenshot

** Now, don’t judge my awful Photoshop skills, I’m no designer!

Images

Perhaps the simplest media type, this one simply displays a small thumbnail that can be clicked to get a fullscreen version. For GIFs (or JIFs), we could have a small badge that says it’s a GIF and only animate it on full size view.

Image preview and fullscreen screenshot

Videos

This one is a bit trickier as there are many video service out there. So the simplest would be to use the video embed. We could have some similar behaviour as the image but instead of the full size image, it would start the video playback when clicked.

Video

Now the technical stuff!

There is much to consider and there are limitations. The first point would be the origin of all podcasts on the Internet, the RSS feed. We already have the RSS 2.0 feed specification and ideally we would not venture too far from it. Or we can go full ahead with a specific feed specification for podcasts, more on this later.

We also have the way subtiles are handled in video player. A simple text based solution with timestamps and text. I used this as the basis of the technical representation of the interactivity specs.

A new SRT-style document

For those unfamiliar with SRT documents, you should check it out. We can use this an inspiration for our new functionality. Here is a standard subtitle file:

1
00:00:10,500 --> 00:00:13,000
Elephant's Dream

2
00:00:15,000 --> 00:00:18,000
At the left we can see...

And here how I imagine the new dataset to look like:

1
00:00:20,000 --> 00:01:00,000
[SponsorName](https://example.net/refcode) An exceptional trick
at a fraction of the price. Use code XXXXX for $5 off your
first purchase.

2
00:02:00,600 --> 00:03:00,800
![This funny cat picture](https://site.com/image.gif)

3
00:04:00,600 --> 00:07:00,800
<iframe width="960" height="720" src="https://www.youtube-nocookie.com
/embed/s_BX8UjVUvA?rel=0&controls;=0&showinfo;=0" frameborder="0"
allowfullscreen?gt;</iframe>

The content could be simple HTML or, as I prefer it, with Markdown text. There already exists a bunch of tools to generate SRT document, I’m sure we could easily reuse them or simply make a small app that will record a timestamp when pressed and output an empty SRT-style document (free app idea right here) for you to fill.

RSS 2.0 with something extra

We already have feed and they are using the tag for the audio file. Here is an example.

<enclosure url="http://website.com/show/episode101.mp3" length="98504422" type="audio/mpeg"/>

We could use a second for the interactivity data like so:

<enclosure url="http://website.com/show/episode101.pmd” length=“1024” type="text/x-markdown" />

In the specs it does not say if there could be only one enclosure, so I parsed it in the validator and it is still a valid feed with this second Markdown-type enclosure.

Limitations

Here comes the part where you can chip in your ideas. This play is pretty dandy but has some serious limitations. Like any tech endeavours, there has to be some challenges, no?

  • One more step. First, we would need to host the file with the podcast audio file, this adds an extra step in the publication.

  • Offline. We need access to that file or do changes to podcast apps to cache it. I guess since most podcast apps already are caching the show artwork, an other file to cache might not be too complex.

  • RSS specs limitation. Ideally, we should have this new data inside the RSS feed itself like the show notes. Perhaps introduce a separator so both the show notes and the interactivity information reside in the tag?

What’s next?

Well, the point of this post is to discuss the ideas. I will also create a specs document hosted on GitHub, but before I just want to get your opinion on this. You can use the comments below or post something on your own blogs/twitter/favorite channel, just tell me about it so I can read it!