URI suffix: protocol-handler
Change controller: Fedi-To https://fedi-to.net/
Specification document(s): https://github.com/fedi-to/fedi-to.github.io/blob/main/protocol-handler.md
The protocol-handler
well-known resource endpoint is used to handle Web-First
Protocol Handlers. It allows Web-First Protocol URLs, that is, URLs with
schemes of the form web+example:
, with no otherwise known handler, to be
successfully handled provided that the URL contains an appropriate authority
section. (For the definition of “URL”, consult Appendix B.)
The protocol-handler
endpoint MUST be navigable, meaning a browser can be
pointed at it, and it MUST accept a target
query parameter. This query
parameter contains the target URL that should be opened. This target URL MUST
be a full URL, including the scheme, encoded using the
component percent-encode set. Effectively, the endpoint is used following the
steps in HTML #protocol-handler-invocation (as of 2022-12-26), as if the
registered protocol handler were for /.well-known/protocol-handler?target=%s
,
but skipping steps 2 and 3. (Indeed, this was chosen for compatibility with
registerProtocolHandler
.)
The authority section is parsed following the same WHATWG URL spec rules as
for https
URLs, except it MUST strictly follow two forward slashes (//
)
after the scheme section. That is, web+example:\\example
does not have an
authority section, but web+example://example\
does. The following table shows
some examples; note that all of these examples are valid URLs (they can be
handled by an explicit protocol handler), but only those with an authority can
be handled when no explicit protocol handler is known.
URL | Authority |
---|---|
web+example:/// |
None |
web+example://\ |
None |
web+example:///example |
None |
web+example:\\example |
None |
web+example://example.org\ |
example.org |
web+example://example.org |
example.org |
For one possible implementation of this process, refer to Appendix A.
When navigated to, this endpoint SHOULD appropriately handle the given target
URL. For optimal compatibility, it SHOULD return 404 Not Found
for unknown or
unsupported schemes and it SHOULD return 307 Temporary Redirect
or
308 Permanent Redirect
for schemes it can handle.
The protocol-handler
endpoint is a simple navigable web resource with a query
parameter, comparable to any other. However, care should be taken not to
implement it as an open redirect - if the protocol handler at example.org is
passed an URL with an example.net authority, it SHOULD NOT redirect to
example.net, but instead to an app in example.org that fetches the target URL
and provides the relevant content, with the appropriate security measures as
required by the app.
The following Rust code shows how Fedi-To implements the application side of
this specification. It uses version 2.3.1
of the url
crate.
use percent_encoding::utf8_percent_encode;
/// Error kind returned when trying to find the fallback protocol handler.
enum FallbackError {
/// Returned when the given URL, while valid, does not provide a fallback
/// handler.
NoHandler,
/// Returned when the given target is not an URL.
NotAnUrl,
}
/// Checks whether the `scheme` part of `web+scheme` satisfies the desired
/// constraints.
fn is_scheme_invalid(scheme: &str) -> bool {
// valid schemes are non-empty and are entirely ascii lowercase
// so invalid schemes are empty or contain non-ascii-lowercase.
scheme.is_empty() || !scheme.trim_start_matches(|c: char| -> bool {
c.is_ascii_lowercase()
}).is_empty()
}
/// Attempts to find a fallback protocol handler for the given target URL.
///
/// The target is assumed to be normalized, as per the WHATWG URL spec. (Note
/// that Fedi-To doesn't actually check that it is, but that's a Fedi-To
/// issue.)
fn get_fallback(target: &str) -> Result<String, FallbackError> {
use FallbackError::*;
// find the scheme
let scheme = {
let colon = target.find(':').ok_or(NotAnUrl)?;
let scheme = &target[..colon];
if !scheme.starts_with("web+") {
return Err(NotAnUrl);
}
let scheme = &scheme[4..];
if is_scheme_invalid(scheme) {
return Err(NotAnUrl);
}
scheme
};
// replace web+scheme with https
// this allows us to handle web+ URLs with the semantics we actually
// want, which is roughly the same as https, with a few differences
let mut as_if_https = target.to_string();
as_if_https.replace_range(0..4+scheme.len(), "https");
// the main difference is that unlike https, authority is optional.
// so, first check that there should be an authority.
if !as_if_https.starts_with("https://") {
return Err(NoHandler);
}
// then also check that the authority actually exists.
// this is necessary so we don't end up parsing web+example:///bar as
// web+example://bar/ (which would be wrong).
// note that we do parse web+example://bar\ as an authority! (but
// everything else - like the path - we treat as opaque to us)
if as_if_https.starts_with("https:///")
|| as_if_https.starts_with("https://\\") {
return Err(NoHandler);
}
// NOTE: we only do this parse to extract the domain/port, it is up to
// the protocol-handler to deal with malformed or malicious input.
// NOTE: this is the same URL parser as used by browsers when handling
// `href` so this is correct.
let mut url = url::Url::parse(&*as_if_https).map_err(|_| NoHandler)?;
url.set_path("/.well-known/protocol-handler");
let _ = url.set_username("");
let _ = url.set_password(None);
let mut params = "target=".to_owned();
params.extend(utf8_percent_encode(&*target, COMPONENT));
url.set_query(Some(&*params));
url.set_fragment(None);
Ok(url.into())
}
The term “URL” refers to the concept of URL as defined in the WHATWG URL spec.
It should not be confused with “URI”. This is a deliberate choice because
browsers follow the WHATWG URL spec for parsing e.g. the href
attribute in an
HTML <a>
element, and this spec needs to be consistent with what browsers do.