Maven Imported 1.12 Million Fediverse Posts (Updated)
Here we go again...
💡 This article has been updated.
A recent investigation by Liaizon Wakest revealed that Maven, a new social network founded by former OpenAI Team Lead Ken Stanley, has been importing a vast amount of posts from the Fediverse without anyone’s consent. Additionally, it’s pulling in Bluesky statuses connected via Bridgy Fed.
In addition to pulling in posts, the import process seems to be running AI sentiment analysis to add tags and relational data after content reaches Maven’s servers. This is a core part of Maven’s product: instead of follows or likes, a model trains itself on its own data in an attempt to surface unique content algorithmically.
It’s worth mentioning that Maven received 2 million dollars in funding from former Twitter CEO Ev Williams and OpenAI CEO Sam Altman. While I have little input on Ev Williams, the relationship between Maven and OpenAI could be seen as more than a little problematic, as funding could give both parties greater incentive for Maven to adopt OpenAI’s technologies and policies.
What’s Going On?
Digging into the situation, it looks like Maven is working on their own ActivityPub implementation. Jimmy Secretan, Maven’s CTO, confirmed this in a post.
I just can’t keep any secrets around here can I 😀? As you mentioned, we have actually started ingesting posts from Mastodon (toots as they call them 😀).
We are looking to mix them in to the feed, and are doing some limited tests with that now. The good news is that when you reply to these, it should generally work to communicate across systems through ActivityPub.
We are hoping to use this to help connect Maven to a larger audience and a wider world.
This is also supported by looking at Maven’s staging environment, which has ActivityPub response data enabled. The goal for Maven is to federate to these posts back to the Fediverse for seamless communication, but the integration in their live environment seems to only go one way.
Mixed Expectations
On the one hand, Maven seems to have really dropped the ball here. One of the most important things about coming into this space as a developer is to communicate openly, and set expectations with the user community. A big part of the Fediverse cares deeply about consent, and the lack of any opt-in / opt-out mechanism feels like a missed opportunity.
On the other hand, we have to address the myths that crop up about privacy and content controls in the Fediverse. A lot of users have expectations about how their public content can be interacted with. Even 15 years in, we’re still not at a place where people have robust, conditional controls over who can view, interact with, or manipulate public content.
We also still don’t have great resources for setting cultural expectations for developers coming into the space. As we stated in our Content Nation article, most new developers have the ActivityPub spec, and little else. As a network, we need to take it upon ourselves to make our expectations front and center.
What Now?
Shortly after Liaizon made their post, Jimmy Secretan made an announcement on Mastodon that they’ve deleted the entirety of the import.
It’s clear from the feedback on this thread that even our experiments with the tech were confusing to users and didn’t fit with other people’s expectations of how it should work.
We are currently pausing this integration, at least until we can better understand how Maven can fit in as a good citizen of the Fediverse.
Searching within Maven’s app, it appears that thousands of Fediverse handles and posts are suddenly gone. This is a good development, but Maven probably has a long way to go before any part of the Fediverse will want anything to do with them.
Update
A short while ago, Jimmy Secretan posted this response on everything that happened today.
We have paused everything related to our Fediverse ingestion for now and we are removing everything ingested.
To be honest, the extreme negative reaction was a surprise to me, as I thought interaction between disparate systems was the entire point, but clearly we didn’t navigate the culture correctly.
The thing about participating in a social space, either as a user or as a service provider, is that you have to take the time to understand the norms. As a network of networks, those can be hard to pin from. Communicating your plans early and taking feedback go a long ways towards setting expectations. You can’t simply just implement a protocol and then pull in vast amounts of remote content to your network with no notice, and expect people to be okay with that.
I’ll leave you with this anonymous quote, since it feels appropriate: “Trust takes years to build, seconds to break, and a lifetime to repair.” If Maven wants to be a good steward of the Fediverse, it would be good for them to remember that.
Retraction Concerning Private Mentions
Update: an earlier version of this article stated that private DMs from Mastodon were mirrored onto Maven. Due to an extremely unfortunate set of circumstances, the user in question had accidentally originally created a public post before hitting the “Delete and Redraft” button. What ended up on Maven was the cached public copy that never got deleted from another Mastodon instance.
We apologize for this error, and have updated the story to set the record straight, after the admin who initially reported the issue uncovered new details.
@news wait wtf, private dms, maybe our current implementations has some leaks going on
@AmyIsCoolz @news Oh no. That’s bad. And, they haven’t disclosed how they did it either way ? Truly someone who doesn’t understand how tech works, even less ActivityPub.
Por mi que mave se vaya al cajaro, algunas instancias ya estan dejando de federar con https://www.maven.ly/ & https://www.heymaven.com/ que podemos esperar de estas prácticas tan tÃpicas y nefastas que realizan los OpenIA para entrenar sus inteligencias y prototipos, joder.
Thanks you for the information.
@weirdwriter I just want to let these thieving bastard’s crime destroy them
@MikeImBack Unfortunately, not right now. The closest thing would be to change your default post settings to followers only then lock your account to where you have to approve follow requests
@weirdwriter isn’t there some kind of AI self-destruct code we can post?
@saaste Yksityisviestien vuotaminen kyllä vaatisi että Mastodonissa on jokin vakava bugi. Ei pitäisi olla normaalisti mahdollista.
@nen Niinpä, sitä vaihtoehtoa artikkelikin mietti, mutta mitään selkeitä vastauksia siinä ei ollut. Screenshotit kuitenkin näyttivät, että yhden instanssin sisällä käyty privakeskustelu oli mennyt Mavenin puolelle.
I understand you’re saying that this was a major violation of trust, but it seems like they are being accused of violating an unwritten set of norms. Maven followed the ActivityPub spec and the terms of service. They downloaded publicly accessible data using Mastodon servers and services as designed. They then analyzed that data and ran an algorithm to add labels, similar to how every fediverse server does. The difference here is that Maven used machine learning to add some labels, whereas others add labels such as timestamps when the local server downloads the data without using newer machine learning tech.
Bing, DuckDuckGo, and Google also do this; they crawl the fediverse, use AI and machine learning to label content, and display it in different contexts.
The tags that Maven adds are pretty innocent. They are just adding hashtag-like labels for discoverability.
Furthermore, many people are upset that Maven is leaking people’s DMs. This is like living in a house where you refuse to have a front door or curtains on your windows and then getting very upset when somebody wanders in and sits down in your living room or looks in from across the street. The fediverse, by design, has no privacy. DMs are public! It says right there in Mastodon that these aren’t private. Nor are Bluesky’s DMs, by the way. There is no end-to-end encryption in the fediverse yet. Evan Prodromou is actually working on this, likely adapting the MLS standard, which is great but doesn’t exist yet.
So my question is this: Why does the fediverse rely on unwritten and undocumented norms that are not mentioned in either the specs or terms of service? And why are people constantly surprised when others don’t follow these hidden social conventions?
Hey Rabble, thanks for taking the time to share your thoughts!
So, yeah, you hit the nail on the head: the primary problem, first and foremost, was a general lack of communication. A criticism that I myself hold is that we still rely on unspoken knowledge that lives in the heads of a few people. As I stated in some of my other articles, a big problem is the disconnect between community principles and expectations, and technical specifications. I’ll readily admit that people crossing their arms and saying “well, you should have known!” about some elusive subject is an objectively bad experience. An idea that I keep coming back to involves the possibility of launching a portal for prospective Fediverse developers to not only find the ActivityPub protocol and example code, but also explain some of the norms and expectations within the Fediverse, and some of the protocol extensions to ActivityPub that exist today.
The main problem with this situation, in my opinion, is that Maven neglected to reach out to the community first, and study how the network operated, what the norms were, and how good stewards did things. This is not to say that they had to bend over backwards or anything, but: if you’re going to do something like ingest a million posts…maybe make yourself known beforehand, and explain what it is you’re doing. I can’t guarantee that there won’t be pushback, but it would’ve cemented at least a little bit of goodwill.
Maven’s ActivityPub implementation was super janky, and the appearance of remote content that did not look or act like remote content led to a lot of confusion. The key concern about Maven isn’t just that they did all of this, ran a huge amount of data into their own platform, analyzed AI, and added metadata to posts that look like they came from Maven. The bigger concern is that, in addition to doing it without asking if anybody was cool with it, there’s a non-zero possibility that some of whatever was imported ended up in the training data that the AI uses for its algorithms. In practice, this isn’t a huge issue…but, when it comes to consent and varying perspectives on copyright, it’s kind of fucked up.
As for Mastodon’s private messaging system: yeah, I hear you loud and clear. It sucks. I look forward to the day that something better comes along, and I’m hopeful about Evan Prodromou’s work with bringing E2EE to ActivityPub DMs. Sometimes, it’s situations like this that are necessary for us to realize how crappy some of our tech is, and that now is the time to iterate on something better.
I’ll conclude by stating that I agree with you on all points about the underlying problems, and I think surfacing knowledge resources that are easy to access, and easy to understand, are vital. I do think that the Fediverse’s negative reaction is largely valid in this situation, but it may have more to do with the alignment of user consent and community biases against AI and Silicon Valley than almost anything else. Regardless, this is a situation where we actually have to put our money where our mouth is, and do something, rather than have a cycle that repeats over and over again.
Not everyone using the fediverse agrees on how public data should be used. We shouldn’t assume there is a consensus just because a large group of the fediverse spoke loudly. I don’t think they did anything wrong, but I agree we should have better tools to control privacy if a user chooses to do so.
> Furthermore, many people are upset that Maven is leaking people’s DMs. This is like living in a house where you refuse to have a front door or curtains on your windows and then getting very upset when somebody wanders in and sits down in your living room or looks in from across the street. The fediverse, by design, has no privacy. DMs are public!
I think “DMs are public†is an exaggeration. Yes there are plenty of UX problems with it. No E2EE, which means the admin[s] of the instances involved in the conversation can read it if they want to, as well as other people getting easily added unintentionally to the mention-only conversation. I’ve come to accept these flaws and keep it in the threat model inside my mind. It’s just a janky email/IRC channel/unencrypted XMPP MUC.
What I didn’t expect is how in the world Maven was able to access those mention-only posts when they are not mentioned at all AFAIK nor the admin of hackers.town as shown here. The Mastodon software of the instance shouldn’t have allowed Maven to fetch the post. There’s clearly a security bug in Mastodon’s implementation at least, and Maven didn’t even acknowledge that.
Gargron has made a statement about this at https://mastodon.social/@Gargron/112608441965799612 and I withdraw my accusation about it being a security bug in Mastodon, which in hindsight may have been done hastily. This is most likely a delete-and-redraft situation where the redraft put it in mention-only visibility instead of the previous visibility being Public, and Maven probably didn’t implement honoring delete requests, therefore allowing the Public post that they’ve fetched before deletion to stay up.
Looked into this yesterday and at least one Threads account listed over there, too. Not sure, if this was consensual.
And I fully agree: It is not about not wanring to grow the fediverse but about at least getting notified when your content gets pulled to another platform.
@jpavonabian me flipa como a estos ai bros les sorprende que haya uan reacción negativa a sus bien intencionadas acciones de “coge el dinero y corre”.
This is because most of instances are using Mastodon, which has no “private†setting. My instance runs on Pleroma, and is set as privte. So Maven tried to harvest, and got 401. This is what happens when you use a software which is not interested in granting security features.
You can restrict the viewing of the local and federated timelines for logged-out users in Mastodon too. It’s not exclusive to *oma.
believe they understand what #noBot tag in profile means… do they? :>
@news Thanks for the write-up! Especially the leaked private message is crazy, and tells me that probably my Mastodon web intereface should not call it a private message… 😛
@news@wedistribute.org not really a fan of people saying “DMs aren’t e2ee, so it’s perfectly fine”
yes it’s a problem that they’re not e2ee but that doesn’t mean people are supposed to be fetching them as public
if I sit on my front steps w/o locking the door and someone runs past me to go into my house, that’s still incredibly invasive?
[edit: realized people will see this comment later, so clarifying, it turned out they did not have a DM, so they didn’t do any kind of interception.]
@brettk I am solidly in the ‘open protocol means anyone can participate’ camp, but in this specific case they didn’t implement the protocol, they just used an API from mastodon.social and scraped the federated timeline.
@brettk I think we’re basically in agreement. But I think my main point is unlike any of those other projects, they aren’t fediverse citizens at all. You can’t be a citizen of the fediverse if you aren’t using/implementing the protocol the fediverse runs on.
Self-hosted news, updates, launches, and a spotlight on DweebUI – a management interface and dashboard for Docker containers
I feel sorry for Jimmy Secretan. The amount of hate and harassment developers get for trying to integrate the Fediverse is absolutely horrible, and the amount of misunderstanding — from multiple independent developers — shows that the loud minority on the Fediverse (the “Fediverse cultureâ€) doesn’t actually gel well with how the rest of society works. Shame on you, mob.
@cos Screenshottien verran siitä yksityisviestien siirtymisestä oli ainakin todisteita. Miten se oli tehty? Sitä en tiedä. Oliko screenshotit aitoja? Sitäkään en tiedä.