instant-messengers: delist Riot #1047
No reviewers
Labels
No Label
🔍🤖 Search Engines
approved
dependencies
duplicate
feedback wanted
high priority
I2P
iOS
low priority
OS
Self-contained networks
Social media
stale
streaming
todo
Tor
WIP
wontfix
XMPP
[m]
₿ cryptocurrency
ℹ️ help wanted
↔️ file sharing
⚙️ web extensions
✨ enhancement
❌ software removal
💬 discussion
🤖 Android
🐛 bug
💢 conflicting
📝 correction
🆘 critical
📧 email
🔒 file encryption
📁 file storage
🦊 Firefox
💻 hardware
🌐 hosting
🏠 housekeeping
🔐 password managers
🧰 productivity tools
🔎 research required
🌐 Social News Aggregators
🆕 software suggestion
👥 team chat
🔒 VPN
🌐 website issue
🚫 Windows
👁️ browsers
🖊️ digital notebooks
🗄️ DNS
🗨️ instant messaging (im)
🇦🇶 translations
No Milestone
No Assignees
1 Participants
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: privacyguides/privacytools.io#1047
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "delist-riot"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We previously added a warning on privacy concerns in #1024 after
Prism-break had delisted Riot and Notes on privacy and data collection
of Matrix.org was released revealing concerns even with self-hosted
homeservers.
Today Libre Monde has released another part on privacy investigation on
Matrix.org 1 revealing that they aren't GDPR compliant nor privacy
friendly and behave shadily such as by announcing removal of data as a
result of GDPR request. 2
Resolves: #840
Deploy preview for privacytools-io ready!
Built with commit
366b58217e
https://deploy-preview-1047--privacytools-io.netlify.com
I must disagree on many levels:
Matrix is the best chatting application for decentralization after XMPP.
Matrix has implemented good standards for encryption, like you cannot disable it once enabled.
Fully Free Software (no proprietary dependencies unlike Signal's BS)
What I would recommend is listing Matrix instead of Riot.
Suggestion:
Or, a complete re-do of the instance messengers.
Seems very odd that all listed providers have warnings.
Maybe demote it to the Worth Mentioning section and mark it for "Advanced users" only with instructions to use a server other than matrix.org.
Perhaps it is worth investigating Keybase and/or Briar as alternative recommendations.
What do you think about https://github.com/privacytoolsIO/privacytools.io/issues/1049 ?
The paper says:
I think there is an open issue about both, but Keybase has a antifeature of only the client being open source.
Hi everyone, it's Tom, the guy who handled the GDPR request in question and is project managing the current privacy work at New Vector. We have invested a lot of time and effort at NV to deliver a GDPR-compliant decentralised federated messaging service and we're proud of what we've achieved, so it saddens me to see our efforts so uncharitably characterised (and for that characterisation to be so uncritically accepted). There are some pretty wild assertions in that post, but to put a few things straight here:
I think that some of the concerns brought up in those documents are valid, however some of them make less sense. To keep this short and easy to read, I'm going to focus on only a few issues mentioned in them that I think lack context. Ones that I don't mention I either think they might be significant in some way, or aren't sufficiently familiar with.
This could almost certainly be done even if Welcome Bot didn't exist, and getting rid of Welcome Bot wouldn't eliminate the concern of single-user homeservers potentially revealing the user's general location.
2. Being able to see someone's Matrix ID based on their email addressClarifying: I agree that this is an issue, users are under the assumption that their email is only mainly used to reset your password, which is misleading.This is so people can invite you to rooms using your email address. This isn't a very important feature to me, but I can kinda see how it might affect usability. I still would have preferred to know about it though. However, I believe people need to know your email address in the first place in order to see what username it's tied to, I could be wrong about this though. However, I don't think they at all made it clear that it would be linked to your account in a public manner, which irritates me. This should probably be made more clear to users, and/or there should be an option to disable it.
The concern that other homeservers process your data (and not being able to know for sure what they do with it).
This is going to be a problem with most decentralized/federated services. If someone doesn't like that then decentralized/federated services might not be the best choice for them. I understand there are some situations where this might be a concern though.
Media from Riot being able to be copied to the clipboard
??? This is a feature. Also, if a picture couldn't be copied to the clipboard a user would still be able to take a screenshot of it.
Display names and avatars of users being visible to other users without their knowledge
It's very heavily implied that when someone uploads a profile picture or sets a display name that it will be, well, displayed. It is available to people without accounts, however some user data is already available to people without accounts (like logs of some channels for example), in order to display those logs a display name and/or username of users is needed.
Delayed responses to GDPR requests for user data on the Matrix.org homeserver
Matrix is still being developed, and based on what I've seen it looks like they've been attempting to develop ways to be able to comply with user data requests, I doubt delays are intentional. (https://github.com/matrix-org/synapse/pull/5589).
Deleting data in response to a user data request instead of complying with it
I don't know why they would do this instead of simply sending the data, but it's the kind of thing I would recommend asking them
While I absolutely believe that users should be able to request a copy of data their homeserver has about them, a delayed response or deleting data in response to a request doesn't necessarily mean that they're trying to cover something up or have malicious intentions. I agree that they should make a good faith effort to respond to requests within a reasonable amount of time though.
Again, some basic features are still being developed by the Matrix team and I doubt their intention is to make it difficult for people.
One more response:
1.3.0 is the current version of Riot. It only uses
/unstable
endpoints where the spec is not finalised.If the criticism here is that our e2e key backup endpoints aren't finalised... the problem there is that delivering robust multi-device E2E encryption across group chat with persistent message history that is both secure and comprehensible to a non-expert user is extremely complex and requires a lot of iteration. We're getting there (there is even light at the end of this tunnel), but just calling the APIs stable won't make it so.
Hi @lampholder
I am not sure you have archieved GDPR compliancy, as in addition to the issues you mention, there are many that you don't mention, some even coming from homeserver admins.
I made a tracking issue in vector-im/element-web#1049, but I guess it was late night for you too, so you haven't had time to comment.
Here are some concerns raised by @muppeth from Disroot.org:
Is there any work being done on those? @ara4n commented on some issues, but not all in that thread and it has been time, so I wonder if these concerns have became outdated.
I think on Privacytools.io we also consider Cloudflare a problem and I think we have left services with Tor problems unlisted in the past, closer to Riot itself, in our room it's somewhat frequently asked question how to use it with Tor. I think Rot is currently our only instant messenger that collects this much data and it may make a record even in general, but there are too many apps mentioned for me to be familiar with them all.
I cannot reply to the other comments as this is a website for privacy tools and currently Riot seems very far from one to me, especially with the default settings (some of which cannot be configured) and I think a better place to list it would be a instant-messengers.io or similar website if one exists. I do hope these concerns get resolved and Riot can be relisted in the future.
Thanks for the link, I put it on top of vector-im/element-web#1049 while I see that there are several issues that have been listed e.g. in vector-im/element-web#840, but haven't became part of your tracker.
@lampholder Wonderful that you are here, and that I can ask you questions which you did not answer during our exchange.
So Tom, as the person who handles GDPR request: I am hereby requesting that you clarify the following items, following on my GDPR Data acess request of the 20th of June 2019. Please consider this as an official request.
extract-max.zip
which contained room event data. As per Annex D of the research document, the events were extracted from your database on the 17th of July 2019 around the same time, more or less 2h. Given that the two other files your provided me were insignificant in size and held no real information, what reason did you have to wait until the 19th to send me my data?2.1. Why were you not transparent that I would not be able to get access to my Identity data in the morning, but instead let me discover it all by myself?
2.2. Why did you only provide to me the sydent log for
vector.im
, and failed to includematrix.org
which I knew (due to very recent interaction with the server) were written?4.1 Every single backup you had from that data, up to October 2016 which is when I used Riot for the very first time.
4.2 Every single log file on all the servers you manage, as a Data Controller or Data processor.
4.3 Both Matrix.org and Vector.im Identity Server.
4.4 The data relevant to 3PID sessions, which is stored in a different table than 3PID mappings.
5.1 Which lawful basis you used to collect and process my data at that time?
5.2 Why is my data deleted, since it does not fit the scope of your deletion?
As per GDPR, you have one calendar month to reply to my request. Please reply here, in the same medium as I have used.
I hereby give you informed and explicit consent to share here and publicly any and every personal data you may need to answer my questions, which you could normally not in a public setting. I discharge you of any responsibility or misuse of my data that may arise from it.
@lampholder Given that you answer as a GDPR officer, I want to reply to your overall feedback by Raising a concern under GDPR law.
As an individual, I have concerns with the way you are processing my personal information. Your Supervisory Authority, ICO, advises me to Raise a concern with you. I am doing so in the medium you choose to reply to my review which I also consider Raising a concern with you.
As per your own Supervisor Authoritary guidance:
Given that I have done all I could possibly can, have gone beyond what a regular individual should go, I ended up raising my concern about your privacy practices with your Supervisory Authority and provided them with as much documentation, evidences and research as I could. I will raise a concern as an indivdual shortly as I am not satisfied with how you handled my GDPR Data access request.
Despite me doing all of this, you are still not taking my concerns seriously and downplaying them. You are responsible for a personal data breach which you are also downplaying in your announcement as I have shown different numbers, backed with Room IDs and Event IDs. You do not give a link to my research document preventing the public to know what you mean by "It was drawn to our attention this afternoon". By doing so, you do not make yourself accountable for your own practices AND prevent the public from knowing about the details of a personal data breach. You prevent the audience of your blog, which are highly correlated with the people you shared personal data bout, to know the details of an event that directly affect their rights.
Ultimately, your supervisory authority will be the one deciding if indeed you fulfilled your GDPR responsibilities. I do not believe you have and have acted accordingly. Our research document is how we raised our concern which included questions which you MUST answer given our Right to be Informed. That we asked in a document, or in an informal setting, or even verbal is irrelevant for GDPR law. That you consider this an "off-hand comment", repeated to your CEO a few days later in an explicit manner and in bold is your choice which I will let ICO be the judge of.
Again, please consider this me raising a concern for the Nth time, that I lost count of.
@JonahAragon are you okay with this PR being further derailed by us answering Max's GDPR support requests?
In other news, I'm alarmed if one irate user from a hostile fork has the ability to trigger a delisting over concerns over a single GDPR request, which as you can see from the above, we're still validating.
As the author of this pull request and a Privacytools.io team member I am OK with you answering the GDPR support requests and I hope you have good answers that will show that you care about privacy to not be delisted now or to be hopefully relisted in near future after the concerns are resolved.
As I commented previously: Personally I don't find promises on something happening in the future very comforting as they don't help the users now, and as future cannot be predicted, what if the worst happens?
I am not sure if you consider me as a "irate user from a hostile fork" (I assume so as I imagine Maxidorius would be referred as an author of a "hostile fork"), but I am not affiliated with The Grid, I haven't been on their channels/rooms (that I remember), I am under the impression that The Grid doesn't currently even have a usable client and I have triggered the delisting due to multiple concerns (including, but not limited to the research papers / GDPR request) that have been previously raised in the comments of vector-im/element-web#562, vector-im/element-web#840 and most recently in vector-im/element-web#1049. It also seems to me that my previous concerns 6 hours ago in this PR are going unnoticed.
CC: @privacytoolsIO/editorial
Hi Max, I don't think a PR is really an appropriate place to have thisdiscussion, but I will do as you ask.
Fulfilling GDPR requests is not my only job; I have to balance my time across a range of priorities (including project managing the privacy project more generally). We are committed to delivering your responses within the 30 day window, but precisely when it is delivered within that window is subject to a lot of other scheduling concerns. We're a small team with a lot to do, but with each GDPR request we handle we invest in our tooling so we can produce results more quickly with less effort over time.
It was published in a public blog post, and I advised in our conversation that we no longer had the Identity Server data due to the reasons identified in our blog post - in short: that we had deleted it in response to your first blog post.
Providing application logs is not a standard part of our GDPR Data Subject Access Request procedure and was undertaken in response to a specific request from you. I searched the logs for references to your email address for both vector.im and matrix.org Identity Servers. I found no hits to matrix.org recorded in the logs.
The retention policy is to keep data until we receive an instruction that the user no longer wishes it to be held (or their account is closed). As to why it wasn't deleted a year ago - you're right, nothing has changed except for our understanding of the situation. In light of this new understanding, we decided to delete this data as quickly as we could (it took a while to assess the service usage and formulate a plan that didn't unduly disrupt the operation of other homeservers).
It wasn't true on that day - the data was deleted on the morning of the 19th, shortly before your data dump was provided to you. As you understand, the service is delivered by a set of systems - it is not possible to dump data from Synapse, Sydent and Scalar instantaneously and atomically, so each dump was executed as it could be scheduled.
Your Identity Server data will be in our disk image snapshots we take on a regular basis to backup the live infrastructure, but these backups are non-trivial to restore. Seeing as the data is data you already have (you know your own email address and matrix ID) and we have been transparent that it was stored in our systems up until the point it was deleted, I believe it to be excessive and unreasonable for us to rehydrate our backups to provide this data.
We provided you with data from application logs from the two Identity Servers that we run. This data is very likely duplicated in our load balancer logs (I'm grepping to make sure) but will contain the same information.
Yes, this is correct. In case it wasn't clear from the blog post, data belonging to users for whom we couldn't be absolutely certain knew and understood how their data was being processed was deleted from both matrix.org and vector.im.
Our deletion schedule placed the tables pertaining to publicly visible data accessible via the application ahead of the other tables in the database, so on Friday we hadn't yet deleted the contents of these tables. There was one row in
invite_tokens
relating to you (an invite to you in March 2017):These tables have now also been purged, and we've deployed code changes to clear up this data on an ongoing basis.
I've redacted the sender's MXID since, though you've clarified you're happy for me to share your public data here as necessary, I'm not comfortable including another users MXID publicly even though the invite does relate to your user. I'll happily share it in private if you wish.
I answered this question on July 19th - your data was processed under Legitimate Interest
Your original blog post highlighted criteria under which, regardless of the legal basis under which our Identity Servers were set up to process user data, users might not be as aware as we would want of this processing. Users using riot.im/app /develop /staging or the mobile apps to connect to a non-New Vector homeserver would have seen the identity server when entering the custom homeserver details, but they might not have understood what it meant. Although outside of our span of control, users using other hosted Riot instances to connect to non-New Vector homeservers (without their own clear Privacy Policy) might not have seen the identity server referenced at all. Understanding this situation, we deemed it unacceptable and chose to erase the data, followed by some API and UX changes (still in flight) to ensure all users have seen a dedicated Identity Server privacy policy before using it (and to ensure that they understand that usage is optional).
We do have a lawful basis
I don't think anybody is treating you unfairly; we've put a lot of time and thought into the issues you have raised, including your GDPR request.
Duly replied.
@lampholder Either you have your time totally wrong, or you are trying to deceive us.
I have published the timeline of events precise to the minute in Annex A of my document.
This is the amended version with your answers and the most likely deletion time, which would be AFTER your posted about it.
So if you got the order right, just the time wrong:
You had every chance to be honest, forthcoming, transparent and tell me about it since we talked. You did not. You only ever acknowledged anything Identity server related after I got the ZIP file. This is simply deceptive, no matter how you see it. Thank you for confirming that I had my facts straights in term of timeline.
Thank you for confirming that you indeed had the data. As a reminder, this is the claims you have made on the 19th of July:
Once again, you had every chance to say that you do not have the data directly available on disk, but you do have it in backups but is non-trivial to fetch. Instead, you decide to ommit a critical part of the answer which changes it totally.
Obviously I know my own association, why would I be interested in that. I am interested in all the metadata you have about it.
You use sydent which was created by the current employees of New Vector Ltd. In the public source code, we can see the database schema and the precise columns for associations.
These are the others columns being present across two tables:
ts
notBefore
notAfter
originServer
originId
sgAssoc
Again, you are being extremely deceptive by trying to convince me you have a reason NOT to send me the data AND you know this is my expertise in Matrix. Every time I request this data, you keep on coming up with yet another reason why you could not possibly give it to me, or have no reason to give it to me.
The deletion was not part of a retention schedule, or a planned maintenance. You were reacting to a one-off event to which my request was linked.I have already quoted ICO several times which are adament on that fact: Because it was not part of normal day-to-day deletion activities, deleting my data is an offense under Data Protection Act 2018 since your intention was to prevent its discloure, reglardless of the intent being well or ill-founded.
Because you unilaterally decided to delete my data when you had every chance to tell me before and give yourself a chance to not make it an excessive and unreasonable burden on yourself (by simply giving me my data before deleting it), this does not apply. I am still willing to accept you sending me this data before I file my personal complain, which could improve your image in it.
And to be clear: This data was basically one database row. A mere Kilobyte. This does not take hours to process. It's a simply
SELECT * FROM x WHERE address = '<my address>';
, where you had a SQL index onaddress
for the provision of the service in the first place. This would not take more than an instant.Again, thank you for confirming that you indeed had data available for me, which you somehow did not include in the ZIP. As a reminder, you were adamment on Friday in two different occasions that you did NOT have any data in Identity.
Thank you for the answers you have provided and for replying to my GDPR Information requestion. I stand by what I said earlier, but I am glad I now have affirmative statements from yourself that you said "no" several times when it was in fact "yes". In my education this is cold, deceptive lying. I consider this exchange closed on my end.
@Mikaela thank you for giving me the chance to get answers to my questions. I hope the answers and my timeline visualization give you a good picture of what is going on and that you'll be able to make your decision.
I agree. Can we just have one discussion happening in vector-im/element-web#1049? it's not a good idea to have a conversation split between a PR and an issue. It's being hard to follow.
The OP is about delisting Riot because of my research, and that research content is about my GDPR Data access request. The questions I asked are about that access request and will be integrated in the document in a few days. So it is the point of this PR, and is appropriate here since it's content which is not added yet, but must be taken into account to evaluate the OP PR. This is the fine details of this specific issue of anything involved in the research document.
Personally it's case closed. I got my answers, and I don't plan on further discussing my access request.
vector-im/element-web#1049 is an aggregate from what I can see, so that definitely doesn't feel appropriate to discuss the specific details of one of the concerns listed?
FYI, after having no further recourse, New Vector did give me my Email mapping data. As a reminder:
After the data was given to me, I asked why was it finally given to me and the answer was: "We checked with some external advice.". I hope this highlights that there was an constant and express intent to avoid their obligations under GDPR given the continuous changes in their story: Nothing changed in terms of scope or data between the moment I made my request and the moment I got that data, yet I had to fight for it and debunk any claim they presented, every step of the way.
To this day, I have no yet received all the data I have the Right of access to, and New Vector still claims this is all there is to it, even after coming back on their word for Identity twice (Email mapping and Email validation session). I will be sure to further debunk their deceptive claims in the research document.
I will cover all of this in the v1.0 of the 2nd research document.
To our knowledge, you have all the data we have stored. We have done everything we could to supply this data, including restoring backups and mining application logs to help fill in the gaps where data had been deleted (because we cleaned out any data where there could have been confusion over processing, after your first report).
If you feel that we have not handled your request appropriately then we recommend you pursue with the ICO.
@lampholder The people who created the protocol and the reference implementations are the current directors and make the majority of employees with technical roles in New Vector Ltd currently. There is nobody else who knows best where data goes and where it is stored than yourselves.
Given that I have proven you I know there is more than what you claim for Identity, your organization had every chance to double check. It now has been 41 days since my GDPR data request, which gave you all the time you could possibly need to figure it out, as the most knowledgeable entity that ever and most likely will ever exist on Matrix.
"To our knowledge" simply cannot apply here. To give you an analogy: you designed the rules of the game, you created the card deck, you are the dealer and the casino building: it's simply not conceivable you could have missed something. In any case, I will show how someone who's not knowledgeable of the fine details can figure this out on their own using simple means.
This is now the 4th time, if I count correctly, that you have told me you gave me all my data, being wrong twice about that already. As I said in our last private exchange, I will not ask you again for this data (but still very much expect to receive it) since you are clearly doing all you possibly can to deny me access until I prove it exists. I recommend you keep on eye on the v1.0 of the 2nd research doc if you are interested to know what you missed.
Based on the issues listed in https://github.com/privacytoolsIO/privacytools.io/issues/1049 I think it’s safer if we remove Riot for now.
I never understood why riot was listed in the first place, as it still does not enable end to end encryption by default, and even when you enable it, it often caused major breakage in bigger rooms.
@JonahAragon or @BurungHantu1605 could you comment here also? Should we finally merge as there are two approvals and I think @blacklight447-ptio's previous comment would agree while it's not a review.
@JonahAragon please can you give your rationale for the delisting so we can address it appropriately?
Yes @ara4n! @Mikaela has actually been handling most of this, so maybe she can chime in with some other concerns, but this is what I think the main issues were:
We would like the tracking issues to be addressed. The default integration server receiving data from users every time they change rooms in Riot despite not opening or interacting with the integration panel in any way seems unacceptable. Users should not be forced to trust Vector.im with all sorts of usage data simply because they're using their client. The fact that the integration server is evidently closed-source but gets special-treatment in the open-source clients is a separate issue we don't necessarily like, but isn't a major issue necessarily.
The identity server functionality is misleading at best. Especially with email password resets being included in Synapse now, it doesn't seem necessary to even include it by default. Most users seem to be under the assumption that emails will only be used for password reset purposes, and have no interest in being looked up by other people by their emails or phones. An opt-in window asking the user if they are willing to share their email and phone number with Vector.im in order to search for their contacts on Matrix and be found by other Matrix users would be far preferable.
E2EE device names seem oddly specific. I feel like instead of
https://riot.privacytools.io/ via Firefox on Windows
it could simply be namedRiot Web
. I don't see why we need to give away specific Riot instances, browsers, and user operating systems. Yes, they can be changed, but it seems like a lot of users don't even realize there's device lists in the first place. Even better, a "Name This Device" pop-up on first sign-in seems like a good idea, with a warning like "This device name will be used to identify your E2E encryption keys and will be public on your profile."Redactions are not properly removed from the database. This would probably be fine (in my personal opinion) if it was still called "Redact" (which seems more accurate) instead of "Remove" in Riot, and the pop-up warning stated that the event would only be removed from visibility but still stored on the server. The current implementation is very misleading. While I would be fine with renaming them (back) to redactions, I would still prefer you stick with "Remove", but actually remove the events. Even if they were removed after a set period (for moderation reasons perhaps) like you mentioned at https://github.com/matrix-org/synapse/issues/1287#issuecomment-515164610 that would be fine.
Regarding GDPR, The recent data breach in which you released a lot of data completely unrelated to the subject requesting their own personal data seems unacceptable. Ignoring the fact that most (but not all) of that data was public anyhow, flooding users requesting their data with thousands/millions of events not at all related to them does not seem necessary and makes it that much harder for users to determine exactly what kind of personal information you have on record. I also think you should disclose how many GDPR requests you've fulfilled that included the personal data of other users and unrelated events.
Finally, an independent privacy/security audit of your infrastructure and software would probably go a long way towards restoring trust in the community.
In the meantime, I don't think it makes sense to move away from Matrix for our official discussion rooms, I am still a fan of the project. But from a Privacy Tools perspective, recommending it for all use-cases doesn't really make sense in light of recent events.
FYI @Mikaela is a member of our organization as well, and the points she's outlined above and in other issues all also apply if I missed anything. Thanks for being active in the community, and feel free to ping her or I if you have any other questions.
Edit: On that note, she mentioned some more valid issues at https://github.com/privacytoolsIO/privacytools.io/pull/1047#issuecomment-514923100 that nobody from your team addressed, that might be worth looking into.
I also realize that some or all of my suggestions already have open issues in your tracker. For example you've mentioned naming devices at https://github.com/vector-im/riot-web/issues/2295 already, so it's clear you're on the right track. But a lot of these issues have been open for years (that one since 2016) seemingly regardless of their priority in your trackers. The suggestions and issues I've mentioned above are pretty much the bare minimum I would expect from a privacy-respecting tool, to be clear, and I don't think we would reconsider listing Matrix again until they're actually implemented and not just considered.
Is there an issue on synapse to be able to set a "please_delete_after" attribute for a room so that old data will be removed ?
I think that would be https://github.com/matrix-org/matrix-doc/issues/447
That's another privacy feature I'd love to see, something like Keybase's Exploding Messages or Signal's Disappearing Messages. Taking the Keybase route where there could be a room default, and the user is able to configure their own timeout separately would be fantastic (for example the PTIO Keybase room deletes messages after 30 days but many users delete theirs after 7 themselves it seems like).
@JonahAragon thanks for the response. I'm travelling atm and have limited time to go through your points, but a few jump out as needing immediate clarification:
This was a stupid and non-malicious bug in Riot/Web; the fix (https://github.com/matrix-org/matrix-react-sdk/pull/3115) was put up for review within 3 days of the original post. The fix landed ~16 hours later. It was in riot-web v1.2.2-rc.2 (cut 4 days after the original post) and was released in v1.2.2 final the next day. This data has never been used for tracking or intended for tracking; it was just a bug.
After further analysis, it turns out we incorrectly released 24 private message events in the DSAR request in question (and 56 state events - i.e. membership changes) - see the update today to https://matrix.org/blog/2019/07/24/data-portability-tooling-bug.
Otherwise, we gave the user all the data their account could see in the ~2 years it's been on the platform. As they are a poweruser in lots of different rooms, and Matrix keeps history by default, this results in a lot of data - over 3.5M timeline events (or ~7M including state events). This is not "data unrelated to the subject" - it's just all the data attributed to their account, with the exception of the handful of events due to the bug earlier.
In Matrix, much like email, users are considered to own a copy of the messages which are sent to them. If I send you a message on Matrix, you get a copy of it. Just as if I did a GDPR data take-out request on an IMAP server I might expect a copy of all my IMAP spools - both messages sent and received, so too on Matrix.
We are categorically not maliciously flooding users with irrelevant data - we have bent over backwards to provide the most comprehensive GDPR DSAR tool we can offer, which lets the user do a data take-out on their whole account.
The request in question is the only one that suffered the bug that meant the 24 private messages got leaked (from 4 users over 2 rooms). Every other GDPR request we've done (I don't have the full count to hand) has however provided the user's whole dataset - i.e. all the messages they've sent and received on their account.
Sounds like a great idea; we'll try to raise the $ to cover it.
The reason these appear not to be addressed is because there's a whole separate issue going through them all in excruciating detail over at https://github.com/privacytoolsIO/privacytools.io/issues/1049.
Thank you @ara4n. I'll look into this and the issues outlined at vector-im/element-web#1049 further in a bit.
That's fantastic, and let me know when you do so and I'll try and direct some traffic over for that.
I just took a quick scan through this backstory and while I do think team matrix does need to up their privacy game, lets slow down a minute here.
Matrix is not an ad-tech company, AFAIK they don't have a business model for using the data they collect. At the end of the day, they're open source developers and I know all too well what it's like to struggle to give something away and be spat on for it.
Some of these comments are casting them as some proto-facebook and for the record I do not think every comment here is written in good faith. Giving platform to people who build up their own ambitions by tearing down others is not making anything any better, so I think we should use a robust approach and:
@ara4n Thank you for actually explaining the logic of why you sent me 50x more events than I expect to receive. I just have a few questions to clarify some specifics which seems odd to me given my knowledge of how the protocol works.
Could you clarify on which access control mechanism you rely to know that? My account is not on Matrix.org, but on another server. I don't recall ever telling you how access control is done on my server at the Client API level when I made the GDPR request. I agree you might have an educated guess at what my server would see if it follows the specification, but that's state resolution, not access control.
As a simple example, you've banned 8chan rooms from Matrix.org. You did not do so at the state resolution algorithm. If I was to look at it, matrix.org users are still perfectly fine to join if such a join event was received by my server. What was done instead is block their access to the content. Is it unreasonable to think that, maybe, there might be the same kind of restrictions in place for a user from another server?
I got events that
In those cases, my own server does not have most of those events but even if it did, I would not be allowed to even see some of them (like in the case of ban). In some cases, my server was not even in the room and simply was not eligible to receive events. Could you explain how you reached the conclusion that such data which is not accessible/given to me under the protocol own rules is attributed to me in the first place?
The account for which I made the request is on my own server, not on Matrix.org. Analogy with an IMAP server is not possible: I wouldn't have an account to log in with. I have no equivalent of an IMAP account on Matrix.org.
Also events are sent to other servers because those servers are present in the room. While user memberships is used to build the list of servers that should received an event, it has nothing to do with the user. Matrix.org does not send room events to me, it sends it to my server. Right?
My GDPR request was not about my account - it was about the data you have about me. They are totally different things. I do not need to be given my account history, and it is not something you are technically capable of giving me either: my account is on my own server, not on yours. What I requested was the data you have about me on the various services: table reccords that would not be known from my own server, log files, etc. My request was very clear about this.
Maliciously or not, you did send me 50x more data than I cared for, most of which I already had. There was nothing straight forward in parsing and dealing with this data since I had to load the data in a database and sort it out myself. I was flooded with irrelevant data: irrelevant is a judgement call up to me: I never asked for it, I never cared for it, yet I have to deal with it somehow.
Also, the point was to know what data you hold about me which is not present in my Homeserver. There is literally no point getting data I already have. Instead of those 3.4M events that held no value to me, I wish all the room events actually related to me were included - you actually missed some it seems. But even that I'm not yet 100% sure of given I need to parse those unrelated events.
Which account are you talking about here? I do not have an account with Matrix.org. I have an account with my own server, which you federate with. Take-out of my whole account does not even make sense: Matrix.org has no knowledge of what "whole" is: not all rooms have Matrix.org involved, and you do not know anything about my settings or anything like that.
I can certainly appreciate you are trying to help your own users with some take-out option, but that is certainly not what I asked in my GDPR request, nor is it something you can actually offer to me. Or if you think it applies to me, could you please clarify how you define "account" in this case?
@cjdelisle like I mentioned in my other comment, I am a big fan of the work being done on Matrix and I strongly feel that they are the best contender at the moment for federated communication. There are just some significant issues (primarily the ones I outlined) before we can really consider relisting Matrix on the site.
What we're dealing with is the fact that privacytools.io is live and gets thousands of visitors a month. Visitors who are specifically looking for the best privacy solutions. Not the best open source solutions, nor a list of the coolest software projects out there. Recommending Riot/Matrix while it still has the issues I listed—regardless of whether they're being actively improved at the moment—would be a disservice to our visitors.
When the privacy scene at Matrix improves, which looks promising based on the tracker they linked, we will consider relisting them. Notably this delisting did not come with an alternative recommendation, because we do not believe there is a viable alternative to Matrix for what they're doing. We are not giving a competitor a platform, merely recommending people stick with Wire and Signal over federated services like Matrix at the moment if they wish to preserve their privacy from third-parties.
Regarding this specific GDPR issue, I am inclined to agree with @maxidorius. To me it seems like providing every single event associated with an account in a GDPR request would be like sending such a request to Google, and they send back not only every search query you've ever made, but the hundreds of pages of results for said query, intermixed with your data so you need to weed through them all.
The problem is that this analogy is just not accurate for a decentralised system like Matrix - you need instead to be comparing with something like Git or Email.
For instance, on Git: imagine that you made a GDPR req against a Git hosting provider for all data they have on you. Sure; by the letter of the law (although IANAL), I suspect they could just send you all the patches you ever sent into various repositories... which would be pretty useless to the point of being obstructive without the context of the patches. Instead, I hope they'd just hand over a copy of the whole repo, so the user can actually make use of the data - which is effectively what we are doing here.
Likewise, on Matrix: sure, we could have just sent over every event that the user had sent which we had on our server, which would be the bare legal minimum (alongside their other personal data, of course). However, from my perspective, providing someone with a bunch of meaningless one-sided conversations is less than helpful. As mentioned earlier (and in the Matrix.org server's privacy policy), we consider each user to own a copy of the messages they both send & receive in their conversations. Therefore, if they ask to get a copy of their data off our server, we give them the full conversations they were in. This could be invaluable for some user who (say) lost their homeserver, and wanted to rescue their conversations from other servers for their records without having to try to claw it back themselves via Matrix.
The data export was done with the best possible intentions (and in fact, we spent extra time to ensure we were as comprehensive as possible with the data shared - hence the bug that impacted this request) - and honestly we had no expectation that it would be used as a new source of complaints.
However, going forwards it sounds like the right approach will be to ask users who invoke the DSAR whether they want their full conversations (to the best of our ability to provide them) or just the messages they sent - and thus avoid complaints like this in future.
@ara4n Could you also answer my questions please? They are not answered by the simplistic view of "conversations" you use.
@ara4n You are (once again) confusing what is going on here. Nobody is complaining about your intentions. I am personally complaining I did not get all the data to this day and have to fight my way to get it, and still need to. And the way you already provided me with some of that data is just making it so much harder to deal with for no good reason. I didn't loose my Homeserver, nor did I request the full history. Annex A of the doc has my request verbatim: it's the data you have on me across all services, not the history of my account on your Homeserver.
Good intentions or not, the consequences of your actions are the only thing that matter. I got handed nearly 8M raw events, I only asked for data which is about me so ~77k. That's 100 times more! The "bug" is also the very problem here: you shared events I had never access to, you shared personal data I had no right of having. If you did it with other people, that means you did it for my own personal data too! And that is extremely problematic to me, since your privacy notice says you will respect Matrix access controls. But you did not.
We provided your conversation data to you based on whether the matrix.org server’s view of the room considered you to have access to it (modulo the bug that affected your export). As far as I know, you have not articulated anywhere what data you think we have that we still haven’t given you, so I can’t help you there - to my knowledge we have bent over backwards to give you everything you wanted (and more, apparently). Yours was the only DSAR fulfilled by the tool with the negative stream bug, as we have already stated several times. If you think your request was not handled correctly then please take it with the ICO. We don’t have anything else to say on the subject here.
@Mikaela, @JonahAragon i suggest we use https://github.com/privacytoolsIO/privacytools.io/issues/1049 to track progress on addressing the privacy issues you’ve highlighted as blockers, and hopefully get Riot and Matrix reinstated once resolved.
Signing off this thread...
Happy one year anniversary to this issue. Has anything materially changed to warrant reinstating Riot (Element now) with PRISM-BREAK and PrivacyTools?
One other hand, let alone GDPR - I find it mind-boggling that,
I don't know if the concerns @muppeth brought up at https://github.com/privacytools/privacytools.io/pull/562#issuecomment-457878353 (and other comments) were ever fixed or if anything has changed internally, but as an user, removed file uploads are still never removed from the server(s) https://github.com/matrix-org/synapse/issues/1263 (will the encryption protect them forever or will someone in the future be able to decrypt files of today?) while normal removed messages are removed from the server after a week (by default).
Riot/Element has been (re)listed on PrivacyTools for a long time though, I don't know about PRISM-BREAK.
Edit: see also https://github.com/matrix-org/synapse/labels/privacy-sprint and I remembered that the latest per-room nick/avatar are always leaked when searching for the user https://github.com/matrix-org/synapse/issues/5677.