Make ToxAV implementation pluggable #1369
Labels
enhancement
New feature for the user, not a new feature for build script
P2
Medium priority
toxav
Audio/video
Milestone
Motivation
I'm currently doing some experiments with making ToxAV compatible with the standard RTP/WebRTC stack.
From my impression, Tox initially followed this path (encapsulating RTP in Tox), as evidenced by the naming and header fields of
RTPMessage
, but then diverged and invented custom solutions for standard problems (for example: Inventing a "Large Frame" protocol with thedata_length
andoffset
fields instead of RFC 7741 VP8 Payloading).The "batteries-included" approach of ToxAV, meaning that library users only deal with uncompressed buffers, makes it easy to build an AV-enabled Tox client and shields the application programmer from the nitty gritty details of video streaming, but also makes some features, like hardware accelerated video decoding with zero-copy display impossible.
As a result, Tox now maintains an inadequate, incompatible implementation of an AV streaming stack, an entirely out-of-scope endeavour for a project this size.
IMHO, Tox should redefine it's role as being a distributed, secure transport layer for a standards compliant RTP implementation.
In the future, TokTok might still provide a basic AV implementation, but focus on application-layer compatibility, and avoid sinking developer time in developing isolated solutions for problems that have already been solved by a multitude of other projects.
Besides the benefits of using existing implementations, a RTP-compliant ToxAV would enable many other applications, like bridging to WebRTC peers, video-on-demand delivery and more.
Migration Path
Since legacy compatibility is an important requirement for Tox, replacing the protocol overnight isn't an option. A migration path could look like this:
The feasability of 2) is currently researched/demonstrated within github.com/strfry/gotox
API Changes
Signalling
Signalling commands (Call, Answer, Hangup, etc.) are currently sent on a special comm channel, and internally handled by
msi.c
, where a callback-style interface is provided.This seemed like a good spot to hook on, since it's exactly what
toxav_new
does, which i intend to replace.In my prototype, i just expose this internal API by means of the FFI ( TokTok/go-toxcore-c@master...strfry:feature/msi ), so no direct action is needed, but i think it is debatable whether the API described in
msi.h
would be a good cutting point for a public API.Packet ID filtering
The
lossy_packet
interface basically enables us to send and receive AV packets.Only problem, it explicitly checks for AV related packet IDs, to multiplex between ToxAV and userspace.
This just affects a few lines, that need to handle this condition in a different way: master...strfry:feature/pluggable_rtp
Of course statically disabling the ToxAV codepath isn't an option because it would break existing AV clients. Another option would be to disable these checks when tox is built without AV support, but this isn't an ideal solution other.
Maybe this could be dependend on the previous allocation of a ToxAV object?
I'm not sure if there would be unexpected side-effects for existing ToxAV-clients, that might get confused by video packets coming in through the custom_lossy_packet interface?
Open Issues
The (Web)RTC stack uses the Session Description Protocol (SDP, RFC4566) to negotiate various details, such as codecs, protocol extension and ICE connection candidates.
SDP is a common point of criticism of the WebRTC spec, and was mostly accepted for easier compatibility with existing SIP networks.
It's complexity, and integral support for features that are not necessary in the context of ToxAV make it unfavorable for inclusion in Tox.
It is yet to be researched how the necessary options can be mapped to Tox capabilities, and how much can be skipped through higher base specifications within Tox. For example, we can assume support for the Opus audio codec, and don't need to negotiate about things like "PCMU/8000".
Another topic of research is mapping the various RTCP commands to existing comm channel commands for bandwidth regulation.
The text was updated successfully, but these errors were encountered: