mirror of
https://github.com/matrix-org/matrix-spec
synced 2026-04-14 07:04:09 +02:00
97 lines
4 KiB
Markdown
97 lines
4 KiB
Markdown
# MSC2134: Identity Hash Lookups
|
||
|
||
[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been
|
||
recently created in response to a security issue brought up by an independent
|
||
party. To summarise the issue, lookups (of matrix user ids) are performed using
|
||
non-hashed 3pids (third-party IDs) which means that the identity server can
|
||
identify and record every 3pid that the user wants to check, whether that
|
||
address is already known by the identity server or not.
|
||
|
||
If the 3pid is hashed, the identity service could not determine the address
|
||
unless it has already seen that address in plain-text during a previous call of
|
||
the /bind mechanism.
|
||
|
||
Note that in terms of privacy, this proposal does not stop an identity service
|
||
from mapping hashed 3pids to users, resulting in a social graph. However, the
|
||
identity of the 3pid will at least remain a mystery until /bind is used.
|
||
|
||
This proposal thus calls for the Identity Service’s /lookup API to use hashed
|
||
3pids instead of their plain-text counterparts.
|
||
|
||
## Proposal
|
||
|
||
This proposal suggests making changes to the Identity Service API's lookup
|
||
endpoints. Due to the nature of this proposal, the new endpoints should be on a
|
||
`v2` path:
|
||
|
||
- `/_matrix/identity/api/v2/lookup`
|
||
- `/_matrix/identity/api/v2/bulk_lookup`
|
||
|
||
The parameters will remain the same, but `address` should no longer be in a
|
||
plain-text format. `address` will now take a hash value, and the resulting
|
||
digest should be encoded in unpadded base64. For example:
|
||
|
||
```python
|
||
address = "user@example.org"
|
||
digest = hashlib.sha256(address.encode()).digest()
|
||
result_address = unpaddedbase64.encode_base64(digest)
|
||
print(result_address)
|
||
CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w
|
||
```
|
||
|
||
### Example request
|
||
|
||
SHA-256 has been chosen as it is [currently used
|
||
elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events)
|
||
in the Matrix protocol. As time goes on, this algorithm may be changed provided
|
||
a spec bump is performed. Then, clients making a request to `/lookup` must use
|
||
the hashing algorithm defined in whichever version of the CS spec they and the
|
||
IS have agreed to speaking.
|
||
|
||
No parameter changes will be made to /bind, but identity services should keep a
|
||
hashed value for each address it knows about in order to process lookups
|
||
quicker. It is the recommendation that this is done during the act of binding.
|
||
|
||
`v1` versions of these endpoints may be disabled at the discretion of the
|
||
implementation, and should return a `M_FORBIDDEN` `errcode` if so.
|
||
|
||
|
||
## Tradeoffs
|
||
|
||
* This approach means that the client now needs to calculate a hash by itself, but the belief
|
||
is that most languages provide a mechanism for doing so.
|
||
* There is a small cost incurred by doing hashes before requests, but this is outweighed by
|
||
the privacy implications of sending plain-text addresses.
|
||
|
||
## Potential issues
|
||
|
||
This proposal does not force an identity service to stop handling plain-text
|
||
requests, because a large amount of the matrix ecosystem relies upon this
|
||
behavior. However, a conscious effort should be made by all users to use the
|
||
privacy respecting endpoints outlined above. Identity services may disallow use
|
||
of the v1 endpoint.
|
||
|
||
Unpadded base64 has been chosen to encode the value due to its ubiquitous
|
||
support in many languages, however it does mean that special characters in the
|
||
address will have to be encoded when used as a parameter value.
|
||
|
||
## Other considered solutions
|
||
|
||
Ideally identity servers would never receive plain-text addresses, however it
|
||
is necessary for the identity server to send an email/sms message during a
|
||
bind, as it cannot trust a homeserver to do so as the homeserver may be lying.
|
||
Additionally, only storing 3pid hashes at rest instead of the plain-text
|
||
versions is impractical if the hashing algorithm ever needs to be changed.
|
||
|
||
## Security considerations
|
||
|
||
None
|
||
|
||
## Conclusion
|
||
|
||
This proposal outlines an effective method to stop bulk collection of user's
|
||
contact lists and their social graphs without any disastrous side effects. All
|
||
functionality which depends on the lookup service should continue to function
|
||
unhindered by the use of hashes.
|
||
|