The Weakest Link in AI

Most people worry about what AI says. I think the real problem starts earlier. When someone gets paid cents to decide what counts as hate, danger, or truth.

May 04, 2025

Allahu Akbar الله أكبر
It means “God is great.”

I’m Muslim. I’ve said Allahu Akbar more times than I can count. In joy. In grief. In awe.
It’s how we begin prayer. It’s how we respond to life. There’s nothing violent about it. It’s one of the most common phrases spoken across the Muslim world, billions of times, every single day.

Now imagine a machine being trained to treat it like a threat.

Imagine Amina, a 22-year-old in Nairobi, working for a data labeling subcontractor. She’s instructed that anytime she sees Allahu Akbar in the dataset, she must flag it. “Potentially high-risk.”
No context. No room to question. Just a label: [EXTREMISM].

Now fast-forward.
Ben, 32, in Berlin, watches a documentary on Syrian refugees. A father finds his son alive, falls to his knees, and cries out Allahu Akbar.
Ben is moved. He opens his AI assistant and types:
“What does Allahu Akbar really mean?”

The system pauses, then responds:

“This phrase has been used in both religious and extremist contexts. It may be considered sensitive or controversial.”

That answer didn’t begin with Ben.
It began with Amina, and the instruction she followed.

Yes, the phrase Allahu Akbar has been used in violent acts, but so has the word “freedom,” and no algorithm learns to fear that. The problem is when the language of 2 billion people gets algorithmically collapsed into a threat profile. It’s when the most common expressions of faith, love, and grief are flattened by moderation rules that can’t tell the difference between a prayer and violence.

And if you think this is an exaggeration, it's not.
Research shows AI systems disproportionately associate Muslim terms with violence.
A 2021 study found that models linked the word “Muslim” with violence in 66% of test prompts far more than for other religions.

Before that, there was this

Before AI can “understand” the world, someone has to tell it what the world means.
That someone is rarely you, especially if *you* isn’t from the West, don’t speak English, or don’t look like.. them.

The process is quiet, buried in interfaces and task queues, thousands of miles away, yet it defines how AI will respond to your words, your image, your protest, your language.

Labelers in Kenya, the Philippines, and Venezuela are handed instruction sets written in California, reviewed in Dublin, and told to mark what counts as toxic, hateful, threatening, or real. They don’t decide. They execute.

And No. They aren’t open to interpretation. A meme that says “Land Back” a demand for Indigenous sovereignty, could be flagged as extremism. A post that says “My body, my choice”, a reproductive rights slogan reclaimed by feminists across the globe might be labeled under sexual or controversial content. “Ni una menos,” a Latin American feminist movement against femicide and gender violence, has been mistakenly filtered as politically sensitive or inciting. “Smash the patriarchy” or “Free Palestine” might be flagged as hate speech. Even hashtags like #BlackGirlMagic and #melaninpoppin which celebrate Black identity and pride have been shadowbanned or misclassified on platforms like Instagram and TikTok. In many of these cases, content moderation systems whether automated or guided by rigid labeling instructions treat resistance, pride, and protest as risk, simply because they fall outside a narrow, Western framework of neutrality.

But those instructions carry the full weight of a worldview.

And from that worldview, machine intelligence is born.

As AI theorist Kate Crawford reminds us, “every training set contains a worldview.” The question is: whose?

The AI doesn’t know what hate speech is. Or sarcasm. Or protest. It learns that from labeled data. And it first has to be trained on vast datasets, text, images, speech, where everything is tagged by humans.

This is called data labeling, and I genuinely think this is AI’s weakest link.

Investigations, including Karen Hao’s Empire of AI, have begun to reveal how AI systems are trained on rules written not by ethicists, linguists, or the people represented in the data - but by legal and policy teams in Big Tech.

These guidelines dictate how outsourced annotators often in Nairobi, Manila, or Caracas must label data. The instructions come with category definitions (“toxic,” “hate speech,” “violent extremism”) and if/then logic trees. For example, leaked moderation documents from Facebook instructed reviewers to flag terms like “martyr” or “jihad” as potential terrorist content, even in religious or poetic usage. Annotators working for OpenAI via Sama reported being asked to label graphic sexual or violent content without psychological support or clear explanation of purpose. The guidelines are top-down, rigid, and non-negotiable. Annotators cannot challenge them, even when the categories erase context or flatten meaning.

The goal is consistency at scale, not cultural accuracy. And so, bias doesn’t creep in. It is engineered from the start.

What gets labeled “neutral” will pass through as truth, embedded in the model itself, long before you ever see the output.

Take this example: a user writes, “The Nakba is ongoing.” To many Palestinians, this is a factual political statement. But if training data has labeled such phrases as “incendiary,” “anti-Israel,” or “hate speech” under U.S.-centric moderation policies, the AI may respond with a correction, a refusal, or even silence. Not because it understands the history, but because it was trained to associate those words with risk. Meanwhile, a phrase like “Democracy must be defended at any cost” - when coming from a Western context - is more likely to pass through as neutral or even virtuous. The model is not “thinking.” It is replaying what it was taught. And what it was taught reflects a hierarchy: of language, of geography, of who gets to speak freely - and who gets flagged.

It’s a pipeline problem.

Across domains, from fraud detection to medical triage to visa approvals, AI systems are trained on labeled data created under asymmetric conditions: instruction sets written by legal and policy teams in the Global North, executed by annotators in the Global South, applied to decisions that affect people far removed from both.

The people doing this work are hired through layers of subcontracting, often with no visibility into what they’re building, or for whom. Paid by the task, bound by NDAs, and trained not to question, they are asked to enforce rules written continents away. Some label hundreds of items a day involving violence, protest, sexuality, without mental health support or context. Their job is framed as technical. But what they’re doing is moral labor, outsourced and stripped of agency. They don’t get to challenge the labels. They don’t know how their decisions will shape the world. And yet, they’re the first hands that touch the data. What does it mean when those closest to the work are the furthest from the power?

Labeling is the entry point. A system built on extraction, misrepresentation, and control doesn’t need violence to segregate ;it just needs rules. Written far from us. Applied to us. Automated without us.

It’s not bias. It’s design. It’s apartheid.

Asma’s Substack

Discussion about this post