On Waasabi IPFS Live Streaming
2021-11-01, originally published on Open Collective
The last two months (and, in & out, the last two years) I've spent on video technology, in no small part out of motivation for improving the status quo of online events (especially for RustFest, but anyone who'd care, really). But this two months were explicitly dedicated to researching the feasibility and specifics of one task: building a peer-to-peer live streaming component for Waasabi, the framework underlying RustFest Global's online infrastructure, using IPFS / libp2p.
Technology primer: segmented video
Note: there might be some big simplifications here, the goal is to set some baseline understanding about video streaming technology that is useful for understanding the rest of this post.
Generally speaking, there are two fundamentally different ways to streaming live video:
Real-time media streams — two peers establish a persistent connection, a continuous stream of realtime video is piped through this connection from one machine to another. Latencies are practically around wire speed, that is, the delay roughly equals of the transmission delay of the network. RTMP is one frequent example.
Segmented video — live video is chopped up into short video segments, which are deployed on a server (often a plain dumb HTTP server) alongside a manifest file that describes the properties of the video. VOD manifests already contain a list of all video segments, start to finish, while live manifests update continuously as new segments are published for an ongoing broadcast.
Real time streams rely on a direct connection to efficiently stream video at minimal latency, but in the baseline case are not a great fit for 1:N broadcast scenarios (having to maintain direct connections to all N clients), and the stream itself is sensitive to network speed fluctuations. Segmented video has many favorable properties in broadcasting at the expense of encoding efficiency and increased latency. Segmented streams can respond to change in network conditions, (ideally ) seamlessly switching lower-bandwidth renditions, and make much more effective use of resources by publishing video segments on low-maintenance infrastructure (and usually distributing these via global CDN-s). This comes at the expense of simulating liveness, clients must be able to buffer segments of the broadcast and fetch newly published segments continuously to maintain a semblance of live video, all of which increase the glass-to-glass latency of the broadcast. The biggest technical challenge in segmented video broadcasts is making the right trade-offs in video quality, stream latency and distribution efficiency.
The what and why of IPFS
So what is IPFS and what makes it a compelling choice for this particular workload?
IPFS is various standards around content-addressed, p2p storage. The bundle of technologies later evolved into libp2p, a complete modular peer-to-peer networking stack with support for many popular languages, from Rust to JS, Python and Go.
There are two aspects in which this technology is interesting for us, and we are going to be talking about both today: video recordings of content (talks, artist performances and more) and video broadcasts, or live streams. These two, of course, are tightly interwoven (at least conceptually), and in particular in a world of online and hybrid events, so when back in 2020 we set out to re-imagine RustFest in the online space we were baffled by the lack of tooling and innovation in this space.
I won't recount our whole journey here (I've done it once before for those interested), but enough to say that some of IPFS's properties seems well-suited to solve not only some of the technical challenges we were facing in the past years, but also improve on the dynamics of online communities.
Before I go into these properties, let's address the Alphabet in the room.
Whose tube is it anyway?
Pretty much the entire video archive of the Rust community is on YouTube. While the 40K+ followers of that account has certainly contributed to the reach and visibility of the content, not a day goes by that I don't think of the drawbacks of such an enormous centralization. We have seen it before, entire decades worth of data and content, disappear in a puff of smoke, due to T&S violations, IP takedowns, mishaps in automation, simple misunderstandings, and even seemingly completely random, out-of-the blue, unwarranted suspensions.
While I don't think the account is in danger of vanishing overnight, one can never be sure of course. But that's not all of course, YouTube being so ubiquitous to the western eye, it's easy to forget it's not, in fact, accessible everywhere. Whether it's sanctions, local politics or authoritarian governments, there are many, significant global regions deprived of the freely available knowledge these videos are supposed to represent. And on that note, just how free is free in this case? People are constantly subjected to invasive tracking, surveillance and rampant advertising in the middle of the content they are viewing. Of course, infrastructure and bandwidth have costs, content production too, but is it truly the YouTubes and Twitches that are best equipped to provide the infrastructure for these events and this content? Can we do better?
Now you see me
The internet lives in the quantum-superposition of two, seemingly orthogonal states: all content is ephemeral and may disappear at moment's notice; while, conversely, it seems that the internet also never forgets and anything that ever was published may resurface, or come back to haunt you at any point.
IPFS itself is a bit like this, also.
P2P sharing, content addressing and mesh networking means anything that ever gets shared on the network is identified by the hash of the package's contents and is openly accessible to anyone who is in possession of this identifier -- as long as the network maintains a copy, somewhere. Which it probably does, as everything published on the network may be instantly disseminated around the world through peer-to-peer connections making up the global network. But none of these machines are storing this content indefinitely, and may, at their own discretion purge it from existence, so here we see this above-mentioned duality represented: anything that is published may be immediately available to anyone, for any timespan long into the future, but one cannot depend on this being the case and, unless acted upon, content will routinely be swallowed by the abyss.
Unless acted upon is a load-bearing phrase in the above, which brings us to...
A global CDN where everyone gets to be a POP
Content published on the IPFS network may be accessed by anyone, and the package itself may pass through infinitely many nodes. Integrity and privacy of the information contained within can be secured by encrypting the package (and transmitting the decryption key through separate channels) and hashing the resulting container's contents, so the client on the receiving end may verify its authenticity.
This property already enables some interesting usecases, but it's pinning that makes it all truly fascinating.
Nodes on the network may pin certain data, meaning that they volunteer not to garbage-collect it, and keep it around to serve it to users interested in it (=requesting its Content ID). That said, pinning across nodes may be orchestrated (by some agreement between node owners, or simply, by spinning up a cluster of nodes all maintained by the same actors).
This makes IPFS networks really powerful content distribution channels as the users may ensure the content is seeded (in Bittorrent terminology) by multiple dedicated owners while others may cache it and pass it around, ephemerally.
What does that imply for RustFest and video content? We intend to publish the entire historical RustFest archive of recordings (hundreds of gigabytes of videos) on the IPFS network and host them for the foreseeable future. Others may choose to take these datasets and pin them on their own nodes to improve redundancy, or make geologically distributed copies available to everyone, everywhere.
How is this different from, say, the Internet Archive or some other organization or non-profit making a backup of Rust videos on YouTube? Let's say YouTube goes down or suspends the account. People who would normally be accessing these videos through YouTube's system will need to look for and find the redundant copies on the IA website. Whereas, for IPFS, the hosted data is the same. As long as the user has the pointers (CID listing associated with the videos), they will be able to access the content without issue, or having to think about where to look for it.
RustFest has always been a from-the-community, for-the-community event, and all RustFest content is available after the events for the community, so this approach meshes perfectly with our ingrained principles and we are extremely excited to finalize the details of this project and get started.
Before we get into the nitty-gritty of p2p streaming on IPFS, let's briefly talk about another option we have explored earlier for this project:
Peertube, like other fediverse systems were built to provide an open (source) alternative to existing systems. Peertube is a complete system, complex and unruly, and definitely not intended to be a component, embedded in another system.
While on the face of it, I agree that integrating with the fediverse is a great idea (I am an ardent fedi person myself who hosts their own Peertube & Mastodon instances), but that shouldn't mean we need to take the entire system wholesale, and instead might make more sense in the future to integrate Waasabi itself with the fediverse through Activity Pub.
Furthermore, with RustFest we have been looking to experiment with new ideas precisely because the existing platforms were explicitly set in their ways -- ways we had no interest in replicating, which was another thing we didn't align on with Peertube. This is not to say Peertube, or Mastodon are not interested in innovating and improving on their non-free counterparts, its simply that our approach is a lot more deliberate and radical in this matter.
Unsurprisingly, under the hood Peertube's revamped video transmissions also build on the contemporary HLS and DASH segmented streaming formats, while also utilizing P2P file shares over WebRTC in the browser. Unfortunately, nether of these libraries are particularly maintained, nor are they tested/benchmarked heavily against the live streaming use case (particularly, towards low-latency streaming), which has been something we have been keeping a close eye on (and through our testing, Peertube has had really bad G2G, glass-to-glass latencies during live streams, approaching and even exceeding one minute).
"P2P Media Loader starts to download initial media segments over HTTP(S) from source server or CDN. This allows beginning media playback faster. Also, in case of no peers, it will continue to download segments over HTTP(S) that will not differ from traditional media stream download over HTTP."
With the research done around Waasabi's P2P streaming I am now convinced that P2P live streaming is not (simply) technically challenging, but requires carefully aligning incentives — and any implementation wherein the centralized use case (=downloading from an HTTP server) provides the users with a better experience will find that the odds are stacked against the incentivization of P2P sharing (a classic example of the tragedy of commons).
Hence, for an ideal solution it would not only need to be technically superior or at the very least competitive with the state of the art in centralized technology (low latency HLS, LHLS, LL-DASH), but it would need to be imperative that it balances the necessarily higher barrier-of-entry of the peer-to-peer medium and providing low-barrier fallbacks for the traditional medium that doesn't ruin the system's incentives and throws the baby out with the bathwater when it tries to accomplish the original goal of reducing our reliance of centralized services & reducing the load on centrally-provisioned infrastructure.
Why so peer-to-peer??
So what are these goals then that I mention?
- Reduce the technical expertise, infrastructure & investment required for running live streaming events — for all event sizes, from small meetups to large online conferences
- Reduce the cost of running events with an online component, scale events automatically with their audience — especially when the online component or live stream is provided free-of-charge
- Improve the experience and accessibility of online, hybrid and in-person events with an online component — the new IPFS infrastructure is built with many RustFest Global events' experiece under our belt to enable inclusivity features such as free live streams, instant replays, watch togethers, multi-streamed extra content (sketchnoting, etc.) and multi-language streams with live translation
- Reduce broadcast latency to enable even more interactivity features and connect the online audience better with the presenters — significantly reducing livestreaming latencies (to the several-seconds range) enables next-level interactivity and user experiences
So let's see how we are planning to accomplish these, through the current proof-of-concept, and the work laid out through the last couple months of digging and planning.
The current proof-of-concept
We have released an initial proof of concept earlier today. It doesn't do much, but it's honest, good work focusing on some of the features we wished we had (even with commercial providers), but didn't over the past years. I'm talking about the ability of being able to programmatically start and stop RTMP ingest servers (=a server that can receive an incoming stream from a live studio, or OBS Studio or other streaming software), live broadcast them using HLS, and immediately make the recording available after the stream ends.
One of the core features of Waasabi and RustFest Global was that all talks, presentations and performances were but a single, short-lived video broadcast. You can read more about this in the article linked above, but this has been a surprisingly hard to achieve/implement feature in the past and the current PoC makes it trivially easy to accomplish.
Living on the edge
We proud ourselves of the fact that RustFest Global events in the past couple years have been living on the edge when it came to experimentation with novel ideas, trying to feel out the strengths of the online medium. We don't often talk about this, but it's important to note here that this seemingly quite YOLO approach was much more careful experimentation than it initially seemed:
- We had paid a large professional live streaming provider to provide us with a livestream API to use as the ingest and broadcast medium
- We used an online studio provider to make sure nobody's machine, OBS Studio crashing or internet connection going down would be interrupting the show
- At some point, we even had an unlisted YouTube livestream, simultaneously running in case something went sideways that we weren't prepared for
The reason I'm bringing this up here, is that I wanted to point out that while we are interested in, what's more excited to experiment with novel ideas, at the end of the day we still have an event to deliver to our attendees & audience, who trust us to do so, many of them (including sponsors) pay us to do so... and so whatever risky business we endeavor on, we always have to make sure we try to mitigate any potentially arising issues well ahead of time.
This is why I find it so important that the end of the day (and the current proof-of-concept level) Waasabi's IPFS service is merely a fancy HLS server that also publishes on IPFS. Should anything go wrong, we could spin up a couple extra servers on short notice, load balance them, make the source IPFS server syndicate the incoming pinned segments onto them and voilá, we have a fully HTTP-based fallback until we work out the issues we were facing.
In fact, we have it on our future roadmap to explore spinning up new cloud nodes automatically and on-demand, in areas where we have lots of viewers to help reduce latencies and improve distribution. When we say we're living on the edge, we certainly mean it you see...
So that's all the things we already have, let's talk about the things we don't (but we want them still, the known-unknowns, if you will):
Does this come in other sizes?
One of the main reasons HLS was invented in the first place was to support multiple renditions, alternate versions of the same content (be that a different resolution, or e.g. a language). This feature is notably missing from the PoC, mostly because in a P2P world renditions are highly, highly problematic.
Here's the issue: the default live stream of the server is 480p resolution. This is not ideal, so after people's complaints we want to add an alternate resolution, say 720p. 720p is a real crowd pleaser but we run into an issue that we split our audience up: some people are watching a 480p stream and thus requesting, downloading and re-sharing the 480p segments. Another group within our audience is playing the 720p segments. The two group rarely meets, except when one of the clients up or drowngrades in resolution.
This practically means that people watching the 720p live stream are not helping with the distribution of the 480p stream and vice versa. This is a problem because we reduce the efficiency of the swarm in distributing the content.
One potential (and frankly really neat) solution to this problem could be upgrade packs. Think of the following configuration instead: everyone receives the baseline resolution (e.g. 480p) segments, and thus can re-share across the entire farm. The trick comes as additional upgrade pack-s, e.g. extra compressed image data to enhance the baseline stream to upgrade or refine the lower-resolution segments to higher-resolution footage. What's more, these upgrade packs would work after the event, too: imagine live-streaming in 720p resolution, but after the broadcast ended, generating an upgrade pack to improve the broadcast resolution to 1080p or even 4K for the VOD viewers.
If all this sounds too good to be true, it's because it probably is. This needs exploratory heavy codec-level work to deduce if it's even feasible, much less if it's actually worthwhile doing. The motivation is probably clear: data published on IPFS is immutable, and is already distributed to clients, so reducing redundancy (in the form of renditions) is favorable.
Let's talk, like a peer to a peer
Another feature that may seem to be conspicuously missing from the PoC are the actual peers. Browsers in the demo act as mere consumers of data published on the web gateway, and do not interact with the IPFS network, or other peers.
P2P communication of IPFS nodes most commonly happens over TCP, or UDP, or sometimes QUIC. None of these are available to browsers so the options we are left with are:
- Websockets — these connections may be used server-to-browser communication. The complexities (publicly accessible IP address and TLS certificates (wss) requirement) don't make this too ideal of an option for intra-peer communication
- WebRTC with signalling — in this case, peers only use a central server as a guiding post for finding each other through webrtc-star. After a WebRTC connection is established between peers, communication happens in a peer-to-peer fashion (this is very similar to Peertube)
- TURN or central relays (p2p-circuit) — these don't make a lot of sense in this case, since we could just be using them to distribute content
- The WebTransport protocol — truly a holy grail in browser-interconnectivity that exposes low-level QUIC connections to the web platform. QUIC (itself using UDP under the hood) is an excellent match for fast P2P connectivity, the only drawback here is the poor browser support as this is a very recent feature, but browser libp2p support is already being heavily worked on.
The current plan is to use all three viable interfaces (WebSockets for server connections, WebRTC+signalling as a baseline for inter-peer communication and WebTransport for supporting browsers) going forward and further decentralize by switching the current centralized live streaming architecture over to pubsub.
True peer-to-peer live streams with pubsub
The proof-of-concept uses standard HLS mechanics for updating the playlists. At a high level what this means is that our browser will poll a constantly-updating HLS manifest endpoint that tells the browser when new video segments are available, so the browser may request those. There are ways to improve this in a centralized manner (e.g. push updates through the WebRTC message subscription Waasabi already maintains), but that would again put a bottleneck on central infrastructure, so instead we could use peer-to-peer mesh networks to our advantage here through libp2p pubsub.
Pubsub enables us to get best of both worlds performance of segmented and real-time streaming:
- Reduce the length of a segment significantly — instead of individual segments of e.g. 4 seconds of video, the segments are e.g 250ms, or even just couple frames long
- Peers watching subscribe to messages concerning the broadcast — each new video segment propagates through the entire mesh as new segments are published
- Greatly reduced latency & improved reliability — segments reach every subscriber quickly, efficiently and reliably
⚠️ IPFS deep-dive warning ⚠️
In terms of IPFS large segments have their own inherent issues: bundling every individual segment into its own monolithic (1 chunk) DAG node limits the upper size boundary of a segment in 1 megabyte, and reducing peering efficiency as downloading of a single chunk cannot be further parallelized. By bundling the segment subdivided into multiple chunks, peer efficiency improves but we added extra overhead for resolving the individual chunks.
By reducing segment sizes significantly, we can publish them as single-chunk DAG nodes, and can improve propagation by using pubsub where both metadata and full message content (=the video segment) is efficiently distributed by the network. What's more, efficiency can be further increased using gossip-based pubsub, in particular episub, a proximity-aware pubsub variant. In these cases the mesh network of peers continously self-improves, taking hop counts and peer latencies into account, automatically adjusting the underlying network for minimizing latency and improving reliability.
Are we there yet?!
Through the blunt instrument of IPFS publication and public HTTP gateway access (as demonstrated in the first PoC), we can already achieve <30s broadcast latencies which is competitive with basic untuned HLS. This is our baseline for all devices and maximum reach (as this method, as described earlier, is basically equivalent to standard HLS from the perspective of the client's requirements), but we already gain multiple advantages in the form of instant replays, IPFS resiliency for published content and more.
With smart use of IPFS features (e.g. spinning up relay nodes for impromptu CDNs), and tuning stream parameters it's possible to reliably reduce this latency to under 20s, often marketed as reduced latency HLS by industry vendors.
Achieving LL-HLS territory, sub-10s latencies is then possible by implementing mesh networks through pubsub/episub and high-performance client connectivity provided by WebTransport, while still keeping and even improving on IPFS-provided resiliency, network efficiency, and a significant reduction in centralized infrastructure requirements.
All of the above will take time, and a lot of experimentation and testing, but we are confident enough in the outlined step-by-step approach to say: we intend to use this new infrastructure for RustFest Global events in 2023, and if anyone is interested furthering this effort we always welcome contributors and collaborators as we forge our path forward.
A note on restricting access to content and incentives
On a closing note, let us address a question that may have formed during the above read:
How about private content?
All of the above assumes that live streams more-or-less freely available to anyone. This is not an issue for RustFest, as, similarly to all previous RustFest and RustFest Global events, we intend to make the live streams accessible to anyone.
When it comes to options of restricting access to content distributed via IPFS, the following come to mind:
- Security-by-obscurity — restricting access to the list of newly published segments (e.g. the HLS manifest)
- Encrypted segments — encrypting the published segments, with simple symmetric encryption (and distributing the decryption key to people with access), a rolling key, or going full loco and using some DRM mechanism with client support
- Private IPFS networks — isolating the IPFS network hosting the live stream segments from the public IPFS network and only giving access to specific clients
All of those above would represent some sort of trade-off in user experience, network efficiency and security (whereas security in most of these cases means the amount of additional effort is required to access the live stream which can also be interpreted as a degradation in user experience/accessibility, and not really security).
From the perspective of the live stream RustFest Global previously have differentiated on two features for free and paying viewers:
- Stream resolution/quality
- The availability of Replay feature
As for the Replay feature, as described above, this feature will now be by design part of the live stream experience in the IPFS infrastructure, and we do not intend to limit or restrict access to it in any way. The stream quality on the other hand is another story, and here we need to talk about incentives a bit more.
Free as in free lunch
When we say the broadcast is free for anyone to watch on the website, what we mean of course is the cost of making the stream available is borne by someone else but the viewer. Who that someone else is depends on the event: it could be sponsors, the paying ticket holders, or the infrastructure provider making the viewer the product. RustFest has always been vehemently opposed to that last one, but the first two were quite alright as we consider it a form of redistribution, paying it forward and giving back to the community.
Now, in the IPFS era things are a bit different. After all, we are going into all this trouble to reduce the live stream's reliance on centralized infrastructure, and we do this by making You, Dear Viewer, the sponsor. In this new architecture the cost of free viewers are largely born by the viewers themselves — by participating in the distribution of the live stream, viewers contribute processing power and bandwidth to make the live stream accessible to others. In this setting, rather than who pays for the free viewers, how do we incentivize contributing to p2p redistribution, reduce incentives for freeloading to keep the network healthy but not end up being too strict to a point where it limits access becomes the question. And if anything, this question is a whole lot harder to solve and balance sufficiently than the previous one, and it will need a lot of research and experimentation before we can even attempt to provide a satisfactory answer.
But we will never stop trying.
Thank you Dear Reader for reading through this gargantuan exposé into the expansive underbelly of Waasabi & RustFest's infrastructure! As always, we always welcome feedback on social media or email:
- RustFest Twitter
- RustFest email
- Waasabi Twitter (Bay Area Tech Club)
- Waasabi email (Bay Area Tech Club)
- Waasabi Open Collective
We would also like to thank all our RustFest supporters who helped us over the years, and gave us the means & motivation to continue our work in supporting the Rust community.