Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
rfc: 1st draft for signed address records
  • Loading branch information
yusefnapora committed Oct 4, 2019
commit 77e3b6689466ed894c6752ab9091ced38ede3e88
242 changes: 242 additions & 0 deletions RFC/0002-signed-address-records.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# RFC 0002 - Signed Address Records

- Start Date: 2019-10-04
- Related Issues:
- [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47)
- [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436)

## Abstract

This RFC proposes a method for distributing _self-certified_ address records,
which contain a peer's publicly reachable listen addresses. The record also
includes a signature, which proves that the record was produced by the peer
itself and not tampered with in transit.

## Problem Statement

All libp2p peers keep a "peer store" (called a peer book in some
implementations), which maps [peer ids][peer-id-spec] to a set of known
addresses for each peer. When the application layer wants to contact a peer, the
dialer will pull addresses from the peer store and try to initiate a connection
on one or more addresses.

Addresses for a peer can come from a variety of sources. If we have already made
a connection to a peer, the libp2p [identify protocol][identify-spec] will
inform us of other addresses that they are listening on. We may also discover
their address by querying the DHT, checking a fixed "bootstrap list", or perhaps
through a pubsub message or an application-specific protocol.

In the case of the identify protocol, we can be fairly certain that the
addresses originate from the peer we're speaking to, assuming that we're using a
secure, authenticated communication channel. However, more "ambient" discovery
methods such as DHT traversal and pubsub depend on potentially untrustworthy
third parties to relay address information.

Even in the case of receiving addresses via the identify protocol, our
confidence that the address came directly from the peer is not actionable, because
the peer store does not track the origin of an address. Once added to the peer
store, all addresses are considered equally valid, regardless of their source.

We would like to have a means of distributing _verifiable_ address records,
which we can prove originated from the addressed peer itself. We also need a way to
track the "provenance" of an address within libp2p's internal components such as
the peer store. Once those pieces are in place, we will also need a way to
prioritize addresses based on their authenticity, with the most strict strategy
being to only dial certified addresses.

### Complications

While producing a signed record is fairly trivial, there are a few aspects to
this problem that complicate things.

1. Addresses are not static. A given peer may have several addresses at any given
time, and the set of addresses can change at arbitrary times.
2. Peers may not know their own addresses. It's often impossible to automatically
infer one's own public address, and peers may need to rely on third party
peers to inform them of their observed public addresses.
3. A peer may inadvertently or maliciously sign an address that they do not
control. In other words, a signature isn't a guarantee that a given address is
valid.
4. Some addresses may be ambiguous. For example, addresses on a private subnet
are valid within that subnet but are useless on the public internet.

The first point implies that the address record should include some kind of
temporal component, so that newer records can replace older ones as the state
changes over time. This could be a timestamp and/or a simple sequence number
that each node increments whenever they publish a new record.

The second and third points highlight the limits of certifying information that
is itself uncertain. While a signature can prove that the addresses originated
from the peer, it cannot prove that the addresses are correct or useful. Given
the asymmetric nature of real-world NATs, it's often the case that a peer is
_less likely_ to have correct information about its own address than an outside
observer, at least initially.

This suggests that we should include some measure of "confidence" in our
records, so that peers can distribute addresses that they are not fully certain
are correct, while still asserting that they created the record. For example,
when requesting a dial-back via the [AutoNAT service][autonat], a peer could
send a "provisional" address record. When the AutoNAT peer confirms the address,
that address could be marked as publicly-routable and advertised in a new record.

Regarding the fourth point about ambiguous addresses, it would also be desirable
for the address record to include a notion of "routability," which would
indicate how "accessible" the address is likely to be. This would allow us to
mark an address as "LAN-only," if we know that it is not mapped to a publicly
reachable address but would still like to distribute it to local peers.

## Address Record Format

There are many potential data structures that we could use to store and transmit
address information. This section sketches out a possible design using
[IPLD][ipld], although we may end up adopting a different format. Everything in
this section is subject to change as part of the RFC process.

These types are defined using IPLD's Schema notation, the best reference for
which I'm currently aware of is [its own schema definition][ipld-schema-schema].

```sh

## How accessible we believe a given address to be.
## Maybe include params? We could potentially have a subnet mask for local addresses
type Routability enum {
| "GLOBAL" ## Available on the public internet
| "LOCAL" ## Available on a local network (probably in a private address range)
| "LOOPBACK" ## Available on a loopback address on the same machine
| "UNKNOWN" ## Catch all (may include in-memory transports, etc)
}

## How confident we are in the validity of an address
type Confidence enum {
| "CONFIRMED" ## We have verified that we're reachable on this address
| "UNCONFIRMED" ## We suspect, but have not confirmed that we're reachable
| "INVALID" ## We know that this address is invalid and should be deleted
| "UNKNOWN" ## No assertions about validity one way or another
}

## A tuple of an address, how "routable" (public / private, etc) the address is,
## and how confident we are in its validity.
type AddressInfo struct {
addr Bytes ## Binary multiaddr
routability Routability
confidence Confidence
}

## A point-in-time snapshot of all addresses (plus their info) that we know
## about at the time we issued the record.
##
type AddressState struct {
## The subject of this record. Who do these addresses belong to?
subject PeerRef

## When was this record constructed?
issuedAt Timestamp

## A list of all AddressInfo records that apply at the current moment.
addresses List {
valueType &AddressInfo
}
}

## A signed envelope containing an `AddressState` struct, our
## public key, and a signature of the state (verifiable with public key).
type AddressEnvelope {
state AddressState

# Public key of issuer.
pubkey Bytes

# Signature of `state`. Can be verified with `pubkey`.
# Maybe it's better to sign a merkle link to `state` instead...
sig Bytes
}

## Unix epoch timestamp, UTC timezone. TODO: what precision?
type Timestamp Int

# binary multihash of public key
type PeerId Bytes

## A peer id, plus a peer-specific version clock.
## Represents a peer _at a moment in time_, where time is loosely defined as
## unit-less quantity that's always increasing. Version
## numbers must increase monotonically but do not need to be strictly
## sequential. If you don't want to preserve state across restarts or coordinate
## a counter, you can use epoch timestamps as version numbers.
type PeerRef struct {
peer PeerId
version Int
}
```

The idea with the structure above is that you send some metadata along with your
addresses: your "routability", and your own confidence in the validity of the
address. This is wrapped in an `AddressInfo` struct along with the address
itself.

Then you have a big list of `AddressInfo`s, which we put in an `AddressState`.
An `AddressState` identifies the `subject` of the record, who is also the
issuing peer. We could potentially split that out into a separate `subject` and
`issuer` field, which would let peers make statements about each other in
addition to making statements about themselves. That complicates things though,
and may not be worth it.

The state and a signature of it are wrapped in an `AddressEnvelope`, along with
the public key that produced the signature. Recipients must validate that the
public key is consistent with the peer id of the `subject` and validate the sig.

Here's an example. Alice has an address that she thinks is publicly reachable
but has not confirmed. She also has a LAN-local address that she knows is valid,
but not routable via the public internet:

```javascript
{

pubkey: "<alice's public key>",
state: {
subject: {
peer: "QmAlice...",
version: 23456
},
issuedAt: 1570215229,

addresses: [
{
addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice",
routability: "GLOBAL",
confidence: "UNCONFIRMED"
},
{
addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice",
routability: "LOCAL",
confidence: "CONFIRMED"
}
]
},
sig: "<signature of state>"
}
```

If Alice wants to publish her address to a public shared resource like a DHT,
she should omit `LOCAL` and other unreachable addresses, and peers should
likewise filter out `LOCAL` addresses from public sources.

## TODO

Some things I'd like to cover but haven't got to or figured out yet:

- how to store signed records
- should be separate from "working set" that's optimized for retrieval
- need to store unaltered bytes
- how to surface routability and confidence via peerstore APIs
- figure out if IPLD is the way to go here. If not, what serialization format,
etc.
- extend identify protocol to include signed records?
- how are addresses prioritized when dialing?


[identify-spec]: ../identify/README.md
[peer-id-spec]: ../peer-ids/peer-ids.md
[autonat]: https://github.com/libp2p/specs/issues/180
[ipld]: https://ipld.io/
[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch