Chinese AI Companies Leverage Kenya’s Informal Labor Market for Data Annotation via WhatsApp and M-Pesa
In the rapidly evolving landscape of artificial intelligence, data annotation remains a foundational yet labor-intensive process. Chinese AI firms, racing to compete with global leaders like OpenAI, have turned to an unconventional resource: a vast, informal workforce in Kenya. Operating through WhatsApp groups and mobile payment systems like M-Pesa, these companies have assembled what researchers describe as a “shadow workforce” to label massive datasets essential for training large language models (LLMs) and multimodal AI systems.
The practice came to light through investigations by journalists and researchers, revealing a sophisticated yet opaque supply chain. Kenyan workers, often young people with smartphones but limited formal employment opportunities, are recruited via social media platforms such as Facebook. Once onboarded, they join private WhatsApp groups managed by Chinese supervisors who distribute tasks in real-time. These tasks include categorizing text for sentiment analysis, tagging images for object recognition, and evaluating responses generated by AI models—critical steps in supervised fine-tuning and reinforcement learning from human feedback (RLHF).
The workflow is streamlined for efficiency and scalability. Supervisors post batches of data, typically consisting of hundreds of items, with clear instructions in English or Kiswahili. Workers complete annotations using simple web forms or mobile apps linked to the chat, submitting results directly back to the group. Quality control is enforced through random checks and performance metrics; high performers receive priority tasks and bonuses, while underperformers risk exclusion. Payments are disbursed almost instantly via M-Pesa, Kenya’s ubiquitous mobile money service, which handles over 50% of the country’s GDP in transactions annually. Rates hover around $1.50 to $2 per hour, competitive with local informal sector wages but a fraction of what similar work might command elsewhere.
This model exploits Kenya’s high mobile penetration—over 110% as of recent statistics—and a youthful population grappling with youth unemployment rates exceeding 35%. Participants range from university students supplementing stipends to jobless graduates in Nairobi’s sprawling suburbs. The flexibility appeals: work from anywhere, anytime, with no commute or fixed hours. A typical day might involve 4-8 hours of annotation, yielding $6-16 daily. Yet, the informality breeds precarity. There are no contracts, benefits, or labor protections. Groups dissolve abruptly if quotas are met or if a project shifts, leaving workers to hunt for new opportunities.
Behind the scenes, the stakes are high for China’s AI ambitions. Companies such as Moonshot AI, 01.AI, and Minimax—startups backed by tech giants like Alibaba and Tencent—are developing frontier models like Kimi, Yi, and Abab. These require billions of annotated tokens to rival GPT-4. Publicly, these firms tout domestic innovation, but privately, they outsource to Kenya to cut costs and accelerate iteration. Data flows from Chinese servers to Kenyan devices and back, often involving sensitive or disturbing content: violent imagery, hate speech, or explicit material that workers must classify without psychological support.
Ethical concerns abound. Exposure to toxic data erodes worker well-being, with reports of burnout and trauma. Pay discrepancies highlight global inequalities; the same task in the U.S. might fetch $15-20 per hour. Moreover, the lack of transparency obscures accountability—data provenance for AI models remains murky, potentially embedding biases from underpaid, untrained labelers. Kenyan regulators have yet to intervene, as the work evades traditional employment classifications.
This shadow ecosystem underscores broader trends in AI development. As compute costs soar, human labor becomes the bottleneck. WhatsApp’s end-to-end encryption and M-Pesa’s seamlessness enable a frictionless global pipeline, bypassing formal hiring platforms like Upwork or Scale AI. Yet, it raises questions about sustainability. Rising awareness could spur local pushback or international scrutiny, much like the 2023 exposés on similar operations in the Philippines and India.
For now, Kenya’s digital underclass fuels China’s AI surge. Workers log in daily, phones buzzing with tasks that shape the future of intelligence—one label at a time.
Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.
What are your thoughts on this? I’d love to hear about your own experiences in the comments below.