Staff+

Language Classification for Posts

classificationrankingnlpadsinfrastructure

Let me walk you through how I'd design a language classification system for social media posts at massive scale. I'll go stage by stage — business and ML objectives, high-level design, data and feature strategy, embeddings, pipeline architecture including modeling and training details, infrastructure, evaluation, and robustness. This is one of those problems that sounds deceptively simple — "just figure out what language this post is in" — but it's actually a critical infrastructure component that needs to run on billions of posts daily with sub-millisecond latency and feed dozens of downstream systems. Getting it wrong cascades failures through the entire platform.

Unlock Full Solution

Get access to the complete walkthrough, key concepts, summary, and follow-up questions.