Digesting the “Child Safety on Federated Social Media” Report

Child Safety on the Fediverse, from our side.

Sean TilleyAugust 5, 2023

0 34 11 minutes read

Mastodon

Editor’s Note: This article covers an extremely sensitive topic, and includes some screenshots of publicly available status updates that some people may consider disturbing. This content exists solely to give context to different aspects of the situation. None of the images depict gore or pornography.

I have been a part of the Fediverse since it began. I guided community development for Diaspora, and watched the network grow. I’ve seen difficult times, such as the presence of ISIS on Diaspora instances, and right-wing hate speech emerging on the Fediverse. As someone who lives and breathes this network, I want to start a discussion.

A controversial report on the Fediverse was released last week, titled Child Safety on Federated Social Media. It’s a study performed at the Standford Internet Observatory, by researchers David Thiel and RenÃ©e DiResta. The paper focuses on the challenges of large-scale moderation, the presence of Child Sexual Abuse Material (CSAM) on various servers, and the lack of appropriate tooling for admins to handle it.

It’s a difficult, uncomfortable subject, but the researchers are thorough in explaining everything they’ve done. Their methodology, analysis, and recommendations are fully realized. In the interest of speaking earnestly on this subject, we need to study the report as it was written.

To be clear: this response is not an effort to refute the work done by the members of the Internet Observatory. Instead, it is an effort to cut through sensational headlines, and give context to the study.

The Facts

Before digging into everything, I would highly recommend that you pause, and read the paper for yourself. It’s a brief read, clocking in at about 13 pages of actual content. It’s short, well-written, and deeply insightful.

Methodology of the Study

The researchers at the Internet Observatory leveraged a tool called PhotoDNA, a product developed by Microsoft. In a nutshell, this program turns images into a sort of fingerprint, called a hash. PhotoDNA compares hashes it sees in an online environment to a library of stored hashes representing known reported CSAM. This accomplishes two things:

Prevents operators from being directly exposed to CSAM themselves.
Checks to see if that picture indeed matches up in a repository.

The top 25 accessible Mastodon servers reported by Fediverse Observer were targeted, and the JSON metadata was fed into PhotoDNA through Mastodon’s real-time status updates system. The study captured two days’ worth of data.

For ethical purposes, the team did not download, archive, or store media files. They reported offending content to the National Center for Missing and Exploited Children (NCMEC). Only public statuses were subjected to this process. Full user profiles and followers were not scraped – only the metadata for individual one-offs were downloaded for analysis.

Findings & Observations

Here are the most damning results, from the paper itself:

Out of approximately 325,000 posts analyzed over a two day period, researchers detected 112 instances of known CSAM. They also identified 554 instances of sexually explicit content, with matching hashtags or keywords commonly used by child exploitation communities.
713 uses of the top 20 CSAM-related hashtags on posts containing media, as well as 1,217 posts containing no media. The text content primarily related to off-site CSAM trading or grooming of minors.
From post metadata, the researchers saw content categories including Computer-Generated CSAM (CG-CSAM) as well as Self-Generated CSAM (SG-CSAM).
A test run of this analysis pipeline detected its first instance of known CSAM in approximately 5 minutes of runtime. All instances of CSAM detected were reported to NCMEC for triage.

The vast majority of results are on Japanese instances, due to how Japan legally handles Computer-Generated CSAM. Unfortunately, allowing this content on a server has led to the exchange of not just virtual material, but actual material as well. This doesn’t just mean images, either, but discussions and offers to groom children. Here’s what David Thiel, one of the researchers, has to say about it:

Hits were primarily on a not-to-be-named Japanese instance, but a secondary test to see how far they propagated did show them getting federated to other servers. A number of matches were also detected in posts originating from the big mainstream servers. Some of the posts that triggered matches were removed eventually, but the origin servers did not seem to consistently send "delete" events when that happened, which I hope doesn't mean the other servers just continued to store it.

— David Thiel (@det) 2023-07-24T14:13:34.387Z

Comparing the number of results from the sample body, the viewer is left to determine what this data means. Is 0.08% of a network’s post in two days a little? Is it a lot? While it’s safe to say that the ideal amount present should be zero, the reality is that it’s not. This isn’t even the definitive number, but a value taken from a relatively small sample size over a brief period.

Comparisons between the dataset of CSAM results vs the entire body of data itself.

The paper notes that, while this happens much more frequently on Instagram and Twitter, our network is also affected. Federation brings some unique problems, too. For example: post and media deletion does not always correctly federate between servers. It’s entirely possible for unsuspecting servers to still have a copy, but not know anything about it.

Traditionally the solution here has been to defederate from freezepeach servers and…well, all of Japan. This is commonly framed as a feature and not a bug, but it's a blunt instrument and it allows the damage to continue. With the right tooling, it might be possible to get the large Japanese servers to at least crack down on material that's illegal there (which non-generated/illustrated CSAM is).

— David Thiel (@det) 2023-07-24T14:13:54.515Z

David makes an excellent point here, and it’s something I’ve really had to take to heart. The problem isn’t the amount of positive results produced by the study. The problem is that it’s happening. As we continue to defederate from these spaces to protect our own communities, those spaces continue to operate with even less scrutiny than before.

Expert Recommendations

Far from only offering criticism, the research concludes with several recommendations to remedy the problem. The main recommendations include:

a blocklist source of CSAM hosts
admin-level filtering of hashtags and content
methods to rewrite remote content locally (similar to Pleroma’s MRF system)
Better moderation tools
PhotoDNA / Cybertipline reporting
Attestation Support in ActivityPub

That’s a lot to unpack. Some of these solutions might cause the network to trend towards centralization, based on how the tools work today. Let’s briefly step through what these ideas entail, and how they might work.

Subscribable Blocklists / Auto-Filtering

Mastodon offers some tools for filtering out bad actors, problematic domains, and keywords that a given user doesn’t want to see. Unfortunately, this pulls all levels of stream curation to the user level, which is far from ideal. Being able to filter and block these things at an admin level could be a major relief. It would take a lot of work off of the user, and likely lead to a healthier network.

Content Rewriting

One of Pleroma’s best innovations was Message Rewrite Facility, or MRF for short. MRF is extremely powerful – you can catch posts that contain something, and automatically enact a policy on them. Want to put a cover on images from a server that sends out unfiltered pictures of spiders? You got it. Want to drop posts coming from free speech instances that mostly post slurs? No problem.

For admins, this kind of ability is a godsend. It makes stream curation bearable, while dealing with the worst cases on the network holistically.

Better Moderation Tools

Moderation tools in the fediverse range from “terrible” to “pretty useful”, but the recommendations in the paper are solid. The biggest takeaway is that when it comes to horribly offensive content, moderator teams can succumb to fatigue over time. One suggestion involves filtering images to lessen the psychological effect of viewing them, a solution some corporate moderation teams use.

Built-in PhotoDNA / Cybertipline Reporting

Easy access to a CSAM hash database would be huge for admins. The paper proposes implementing a feature where admins can just put in some API keys, encouraging an easy setup that takes only a few minutes.

There are a few headaches, of course. Right now, PhotoDNA really isn’t set up to take thousands of requests on a single resource. Countless servers reporting on the same piece of media might end up being really impractical on the receiving end. This isn’t insurmountable, though: maybe there’s a method instances can use to share consensus on this kind of reporting. A third-party service might be able to consolidate that information, prior to dispatching it to Cybertipline.

Attestation Support in ActivityPub

This suggestion proposes that, if PhotoDNA isn’t a viable solution, maybe we can extend ActivityPub to pass along vital information. We can check whether an image in a post has been hashed, scanned, and matched with an existing CSAM hash. Servers would be able to analyze this data, verify its status, and take measures to get rid of offending content. It could be supported natively within the protocol.

What’s at Stake

Within hours of this paper being published, major media outlets reported that Mastodon was “rife with Child Sexual Abuse Material”, and “has a Child Abuse Content Problem”.

These headlines oversimplify the issue considerably. It sounds as though the space is flooded with child porn and pedophiles, when only an isolated corner was involved. The purpose of the study was not to make this point, but to highlight a serious problem, and offer solutions.

Given that people are more likely to respond to an attention-grabbing headline than actually read the story, a less-informed reader might see the title, and conclude that the Fediverse is complicit in the procurement and distribution of CSAM. There are instant connotations and potential repercussions that can come from this.

Harassment Campaigns

Certain media personalities, such as Libs of TikTok, will absolutely call out this kind of issue. They will point to queer communities on Mastodon as being guilty by association.

This is the kind of thing we’re working with here: People who associate “queer people existing in a public space, possibly accessible to kids” with “THEY’RE COMING FOR YOUR CHILDREN”

The Fediverse, historically speaking, has a significant queer community within its userbase, and has for a very long time. With the advent of Meta’s Threads app enabling ActivityPub, and Meta’s own unwillingness to block accounts like Libs of Tiktok, this could be a recipe for disaster. As of this writing, that account is still active on Threads. It continues to post queerphobic propaganda for all to see.

The media reporting on the Fediverse could potentially hurt vulnerable communities because of an incomplete narrative. Sensationalism can distort the context about the challenges we face, and the kinds of people who are here. Lax moderation policies by large social platforms could incite an unprecedented level of harassment for just about anybody present.

Bad Internet Bills

Shifting purely from harassment on social media alone, let’s talk about civic policy.

When the story about Mastodon’s CSAM problem broke, Illinois Senator Dick Durbin decided to seize the moment. To Durbin, the obvious solution is to change how the Internet works at a legislative level.

On paper, the STOP CSAM Act sounds great: apps and services knowingly promoting or facilitating child exploitation can now be charged with a federal crime, and are open to civil lawsuits by those affected.

Some of the provisions of the Stop CSAM act are unnecessarily burdensome to website operators, though, and could end up causing a lot of headaches for independently-hosted communities. The bill in its current form would require mandatory filtering of content on websites, as well as the potential to knock sites entirely offline without the need for a court order. Much like previous bills such as FOSTA and SESTA, there’s a chilling effect on public discourse and free association.

The biggest, most negative aspects of STOP CSAM are two-fold: it would force people, organizations, and companies to stop offering end-to-end encryption online, and would heighten powers for police surveillance and enforcement. In a country where Facebook can turn around and provide messages to the police about a woman having an abortion, this collision is deeply problematic.

Problems and Challenges

So, let’s say that federated social platforms decide to start implementing CSAM detection and reporting tools. It’s certainly not impossible to do, but there are pretty big questions worth asking on how to do it.

Availability of Hash List Services

Some Fediverse developers have started looking at Microsoft’s PhotoDNA product as a potential solution, where each server plugs into Microsoft’s solution to assist in filtering and reporting CSAM. On paper, this sounds like a great idea: many nodes with varying reach can help submit offending content to the authorities, and NCMEC gets a bunch of free crowdsourced reporting. Win-win, right?

The problem is, services like PhotoDNA are only available to a few organizations, after a vetting process. Alex Gleason (yes, sigh, THAT Alex Gleason) posted an interesting thread highlighting his own efforts to try and apply for the IWF Hash List to work with his fork of Pleroma. While I’d take just about anything that guy says with a grain of salt, his highlighting of the problem is legitimate. What do we do if no instance in the Fediverse can qualify?

The horror story doesn’t end there, I’m afraid.

Terrorist and Violent Extremist Content

A secondary problem is that PhotoDNA isn’t solely used to combat CSAM, but to also report on “terrorist and violent extremist” content. While the PhotoDNA states that it’s strictly used for CSAM purposes, Microsoft’s Digital Safety Report and Facebook’s own newsroom reveal that they’ve used PhotoDNA for reporting on violent extremist content.

What qualifies as “terrorist or violent extremist content”? Google’s own policy for YouTube videos unhelpfully explains:

Content that violates our policies against violent extremism includes material produced by government-listed foreign terrorist organizations. We do not permit terrorist organizations to use YouTube for any purpose, including recruitment. YouTube also strictly prohibits content that promotes terrorism, such as content that glorifies terrorist acts or incites violence. We make allowances for content shared in an educational, documentary, scientific, or artistic context.

The answer is that it’s subjective, relative to interpretation, and depends on the reviewer’s ability to classify the content, as well as a given authority’s interpretation of who is or is not a terrorist.

The thing is, solutions like PhotoDNA don’t just flag content for an admin to review. They automatically report that content to an authority. The ease of use via automation is the point. Under such a system, the potential to make Fediverse nodes snitch on people for talking about things other than CSAM or stochastic terrorism is viable.

Hash Collisions

The concept behind how any of this works is simple: take a file, generate a fingerprint from it, use the fingerprint to identify it. Here’s where things can break down: images can be manipulated to result in matching fingerprints, even though the content is completely different.

A result from *This Hash Collision is Not Porn*, showing two different pictures producing the same hash value.

This could be one reason why there doesn’t seem to be a public resource for everybody to use. Critics have made the argument that revealing the hash values of an existing CSAM database could allow the people distributing it to avoid detection. All they’d have to do is make simple changes to their pictures.

A demonstration of how small alterations to an image can completely alter the hash values. Credit: Hacker Factor Blog

The Hacker Factor Blog has a fascinating teardown of how PhotoDNA works, along with its limitations. In short, the service has major issues in detecting images if there are minor edits to the overlap regions in an image.

Conclusion

This article has been one of the most difficult pieces for me to write. When I started, I did so with the intention of clearing up the CSAM controversy about the Fediverse. The majority of people here aren’t involved in that! The network is predominantly safe for most people to use, as long as they follow best practices that have long existed for online communities.

I also wanted to hear out the researchers and give credit where credit is due. As a network, we owe it to ourselves to evolve, to develop better protections for people in need, and to combat horrific things when they enter the network. Unfortunately, the large corporate/NGO solutions are a massive can of worms that raise serious questions on whether to use them.

Moving Forward

So, where does that leave us? My belief is that, for the network to clean up its act, we need to look at some of the existing grassroots efforts. Fediblock has been instrumental in warning people against bad actors and abusive admins. Oliphant offers a variety of tiered blocklists against known bad operators. The Bad Space tracks problematic instances and offers context as to what happened. Independent Federated Trust and Safety (IFTAS) seems to be a newer project dedicated to furthering those efforts.

Individually, these are solutions maintained by a few volunteers with limited resources. Maybe what the Fediverse really needs is a foundation – an umbrella organization that raises money for critical project infrastructure, offers a CSAM database and reporting tools of its own, and uses proceeds to pay the people doing the hard work. It’s not fun or glamorous, but maybe the real solution is that members of the network come together to build it.