Web Excursions 2022-11-06

Platy Hsu

Nov 07, 2022

Thread by Yishan on the problem of Twitter’s content moderation

Hey Yishan, you used to run Reddit, How do you solve the content moderation problems on Twitter?

The first thing most people get wrong is not realizing that moderation is a signal-to-noise management problem, not a content problem.
Then you end up down this rabbit hole of trying to produce some sort of “council of wise elders”
- what’s really going to happen is that everyone on council of wise elders will get tons of death threats, eventually quit
- you’ll end up with a council of third-rate minds and politically-motivated hacks, and the situation will be worse than how you started.
- they have to be public and “accountable”
a useful framing to consider in this discussion:
- imagine that you are doing content moderation for a social network and you cannot understand the language.
- all you’re able to detect is meta-data about the content, e.g. frequency and user posting patterns.
a “ladder” of things often subject to content moderation:
1. spam
2. non-controversial topics
3. controversial topics (politics, religion, culture, etc)
Vigorous debate, even outright flamewars are typically beneficial for a small social network: it generates activity, engages users.
- It doesn’t usually result in offline harm, which is what typically prompts calls to moderate content.
Moderating spam is very interesting:
- it is almost universally regarded as okay to ban (i.e. CENSORSHIP) but spam is in no way illegal.
  - yet everyone agrees: yes, it’s okay to moderate (censor) spam.
  - there IS no principled reason for banning spam. We ban spam for purely outcome-based reasons
  - It affects the quality of experience for users we care about, and users having a good time on the platform makes it successful.
- Not only that, but you can usually moderate (identify and ban) spam without understanding the language.
  - Spam is typically easily identified due to the repetitious nature of the posting frequency, and simplistic nature of the content (low symbol pattern complexity).
  - Machine learning algorithms are able
    - to accurate identify spam
    - [and] to identify spam about things it hasn’t seen before.
  - Spam filters (whether based on keywords, frequency of posts, or content-agnostic-pattern-matching) are just a tool
    - that a social media platform owner uses to improve the signal-to-noise ratio of content on their platform.
non-controversial topics usually go fine,
- but sometimes one of the following pathologies erupts:
  - a) ONE particular user gets tunnel-vision and begins to post the same thing over and over, or brings up his opinion every time someone mentions a peripherally-related topic.
  - b) An innocuous topic sparks a flamewar
- Just like spam, none of those topics ever comes close to being illegal content.
- But, in any outcome-based world, stuff like that makes users unhappy with your platform and less likely to use it, and as the platform owner, if you could magically have your druthers, you’d prefer it if those things didn’t happen.
Most people are pretty easy to get really worked up.
- there will be NO relation between the topic of the content and whether you moderate it, because it’s the specific posting behavior that’s a problem.
- Here, there is a parallel to the usage of “Lorem Ipsum” in the world of design.
- When people look at moderation decisions by a platform, they are not just subconsciously influenced by the nature of the content that was moderated, they are heavily - overwhelmingly - influenced by the nature of the content!
Because non-controversial topics become controversial topics organically - they get culture-linked to something in the latter or whatever -
- and then you’re confronting controversial topics or proxies for controversial topics.
Like the AI, human content moderators cannot predict when a new topic is going to start presenting problems that are sufficiently threatening to the operation of the platform.
- The only thing they can do is observe if the resultant user behavior is sufficiently problematic.
- [Yet] all [the users] see is the sensationalized (mainstream news) headlines saying TWITTER/FACEBOOK bans PROMINENT USER for posts about CONTROVERSIAL TOPIC.
- This is because old-media journalists always think it’s about content.
  - old media controversy is far, far below the intensity levels of problematic behavior that would e.g. threaten the ongoing functioning or continued consumer consumption of that old-media news outlet.
So we observe the following events:
1. innocuous discussion
2. something blows up and user(s) begin posting with some disruptive level of frequency and volume
  - 2a: maybe a user does something offline as a direct result of that intensity
  - ...
3. platform owner moderates the discussion to reduce the intensity
4. media reporting describes the moderation as targeting the content topic discussed
5. platform says, “no, it’s because they <did X specific bad behavior> or <broke established ruled>”
  - ...
6. no one believes them
7. media covers the juiciest angle, i.e. "Is PLATFORM biased against TOPIC?"
Controversial topics are just overrepresented in instances where people get heated,
- and when people get heated, they engage in behavior they wouldn’t otherwise engage in.
One of the things that hamstrings platforms is that unlike judicial proceedings in the real world, platforms do not or cannot reveal all the facts and evidence to the public for review.
- At Reddit, we’d have to issue moderation decisions (e.g. bans) on users and then couldn’t really release all the evidence of their wrongdoing,
  - like abusive messages or threats, or spamming with multiple accounts, etc.
- The justification is that private messages are private, or sometimes compromising to unrelated parties, but whatever the reasons, that leaves fertile ground for unscrupulous users to claim that they were victimized and politically interested parties to amplify their message that the platform is biased against them.
- I had long wondered about a model like “put up or shut up” where any users challenging a moderation decision would have to consent to having ALL the evidence of their behavior made public by the platform, including private logs and DMs.
- You can’t just dump data, because it’s a heightened situation of emotional tension: the first time you try, something extra will get accidentally disclosed, and you’ll have ANOTHER situation on your hands. Now you have two problems.
Warning: don’t over-rotate on controversial topics and try to do all your content moderation through AI.
- Facebook tried that, and ended up with a bizarre inhuman dystopia.
Comments on the (alleged) “war room team” that Elon has apparently put to work at Twitter:
- I don’t know the other people super well (tho Sriram is cool; he was briefly an investor in a small venture of mine), but I’m heartened to know that @DavidSacks is involved.
- Sacks is a remarkably good operator, possibly one of the best ones in the modern tech era. He was tapped to lead a turnaround at Zenefits when that company got into really hot water
“Content moderation” is the most visible issue with Twitter (the one talking heads love to obsess over) but it’s always been widely known that Twitter suffers from numerous operational problems that many CEOs have tried in vain to fix.

Platy’s Web Excursions

Discussion about this post