mirror of
https://github.com/matrix-org/matrix-spec
synced 2025-12-25 18:38:37 +01:00
Allow for changing the hashing algo and add at-rest details
This commit is contained in:
parent
f8dbf2b360
commit
d2b47a585d
|
|
@ -1,57 +1,59 @@
|
|||
# MSC2134: Identity Hash Lookups
|
||||
|
||||
[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently
|
||||
created in response to a security issue brought up by an independant party. To summarise
|
||||
the issue, lookups (of matrix user ids) are performed using non-hashed 3pids which means
|
||||
that the 3pid is identifiable to anyone who can see the payload (e.g. willh@matrix.org
|
||||
can be identified).
|
||||
[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been
|
||||
recently created in response to a security issue brought up by an independent
|
||||
party. To summarise the issue, lookups (of matrix user ids) are performed using
|
||||
non-hashed 3pids (third-party IDs) which means that the identity server can
|
||||
identify and record every 3pid that the user wants to check, whether that
|
||||
address is already known by the identity server or not.
|
||||
|
||||
The problem with this, is that a malicious identity service could then store the plaintext
|
||||
3pid and make an assumption that the requesting entity knows the holder of the 3pid, even
|
||||
if the identity service does not know of the 3pid beforehand.
|
||||
If the 3pid is hashed, the identity service could not determine the address
|
||||
unless it has already seen that address in plain-text during a previous call of
|
||||
the /bind mechanism.
|
||||
|
||||
If the 3pid is hashed, the identity service could not determine the owner of the 3pid
|
||||
unless the identity service has already been made aware of the 3pid by the owner
|
||||
themselves (using the /bind mechanism).
|
||||
Note that in terms of privacy, this proposal does not stop an identity service
|
||||
from mapping hashed 3pids to users, resulting in a social graph. However, the
|
||||
identity of the 3pid will at least remain a mystery until /bind is used.
|
||||
|
||||
Note that this proposal does not stop a identity service from mapping hashed 3pids to many
|
||||
users, in an attempt to form a social graph. However the identity of the 3pid will remain
|
||||
a mystery until /bind is used.
|
||||
|
||||
It should be clear that there is a need to hide any address from the identity service that
|
||||
has not been explicitly bound to it, and this proposal aims to solve that for the lookup API.
|
||||
This proposal thus calls for the Identity Service’s /lookup API to use hashed
|
||||
3pids instead of their plain-text counterparts.
|
||||
|
||||
## Proposal
|
||||
|
||||
This proposal suggests making changes to the Identity Service API's lookup endpoints. Due
|
||||
to the nature of this proposal, the new endpoints should be on a `v2` path:
|
||||
This proposal suggests making changes to the Identity Service API's lookup
|
||||
endpoints. Due to the nature of this proposal, the new endpoints should be on a
|
||||
`v2` path:
|
||||
|
||||
- `/_matrix/identity/api/v2/lookup`
|
||||
- `/_matrix/identity/api/v2/bulk_lookup`
|
||||
|
||||
The parameters will remain the same, but `address` should no longer be in a plain-text
|
||||
format. `address` will now take a SHA-256 format hash value, and the resulting digest should
|
||||
be encoded in base64 format. For example:
|
||||
The parameters will remain the same, but `address` should no longer be in a
|
||||
plain-text format. `address` will now take a hash value, and the resulting
|
||||
digest should be encoded in unpadded base64. For example:
|
||||
|
||||
```python
|
||||
address = "willh@matrix.org"
|
||||
address = "user@example.org"
|
||||
digest = hashlib.sha256(address.encode()).digest()
|
||||
result_address = base64.encodebytes(digest).decode()
|
||||
result_address = unpaddedbase64.encode_base64(digest)
|
||||
print(result_address)
|
||||
CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w=
|
||||
CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w
|
||||
```
|
||||
|
||||
### Example request
|
||||
|
||||
SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only
|
||||
requirement for the hashing algorithm is that it cannot be used to guess the real value of the address
|
||||
SHA-256 has been chosen as it is [currently used
|
||||
elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events)
|
||||
in the Matrix protocol. As time goes on, this algorithm may be changed provided
|
||||
a spec bump is performed. Then, clients making a request to `/lookup` must use
|
||||
the hashing algorithm defined in whichever version of the CS spec they and the
|
||||
IS have agreed to speaking.
|
||||
|
||||
No parameter changes will be made to /bind, but identity services should keep a hashed value
|
||||
for each address it knows about in order to process lookups quicker and it is the recommendation
|
||||
that this is done at the time of bind.
|
||||
No parameter changes will be made to /bind, but identity services should keep a
|
||||
hashed value for each address it knows about in order to process lookups
|
||||
quicker. It is the recommendation that this is done during the act of binding.
|
||||
|
||||
`v1` versions of these endpoints may be disabled at the discretion of the implementation, and
|
||||
should return a `M_FORBIDDEN` `errcode` if so.
|
||||
`v1` versions of these endpoints may be disabled at the discretion of the
|
||||
implementation, and should return a `M_FORBIDDEN` `errcode` if so.
|
||||
|
||||
|
||||
## Tradeoffs
|
||||
|
|
@ -59,20 +61,27 @@ should return a `M_FORBIDDEN` `errcode` if so.
|
|||
* This approach means that the client now needs to calculate a hash by itself, but the belief
|
||||
is that most languages provide a mechanism for doing so.
|
||||
* There is a small cost incurred by doing hashes before requests, but this is outweighed by
|
||||
the privacy implications of sending plaintext addresses.
|
||||
|
||||
the privacy implications of sending plain-text addresses.
|
||||
|
||||
## Potential issues
|
||||
|
||||
This proposal does not force a identity service to stop handling plaintext requests, because
|
||||
a large amount of the matrix ecosystem relies upon this behavior. However, a conscious effort
|
||||
should be made by all users to use the privacy respecting endpoints outlined above. Identity
|
||||
services may disallow use of the v1 endpoint.
|
||||
This proposal does not force an identity service to stop handling plain-text
|
||||
requests, because a large amount of the matrix ecosystem relies upon this
|
||||
behavior. However, a conscious effort should be made by all users to use the
|
||||
privacy respecting endpoints outlined above. Identity services may disallow use
|
||||
of the v1 endpoint.
|
||||
|
||||
Base64 has been chosen to encode the value due to it's ubiquitous support in many languages,
|
||||
however it does mean that special characters in the address will have to be encoded when used
|
||||
as a parameter value.
|
||||
Unpadded base64 has been chosen to encode the value due to its ubiquitous
|
||||
support in many languages, however it does mean that special characters in the
|
||||
address will have to be encoded when used as a parameter value.
|
||||
|
||||
## Other considered solutions
|
||||
|
||||
Ideally identity servers would never receive plain-text addresses, however it
|
||||
is necessary for the identity server to send an email/sms message during a
|
||||
bind, as it cannot trust a homeserver to do so as the homeserver may be lying.
|
||||
Additionally, only storing 3pid hashes at rest instead of the plain-text
|
||||
versions is impractical if the hashing algorithm ever needs to be changed.
|
||||
|
||||
## Security considerations
|
||||
|
||||
|
|
@ -80,6 +89,8 @@ None
|
|||
|
||||
## Conclusion
|
||||
|
||||
This proposal outlines a quick and effective method to stop bulk collection of user's contact
|
||||
lists and their social graphs without any disasterous side effects. All functionality which
|
||||
depends on the lookup service should continue to function unhindered by the use of hashes.
|
||||
This proposal outlines an effective method to stop bulk collection of user's
|
||||
contact lists and their social graphs without any disastrous side effects. All
|
||||
functionality which depends on the lookup service should continue to function
|
||||
unhindered by the use of hashes.
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue