People You May Know
PYMK is a graph edge prediction problem at extreme scale, operating on a social graph with billions of nodes and trillions of potential connections. I'll work through business and ML objectives, system architecture, data and features, modeling, infrastructure, evaluation, and robustness.
Solution Walkthrough
Business Objective
The objective is to maximize high-quality connection formation while maintaining platform health and user trust. "High-quality" means connections that lead to ongoing engagement (messaging, commenting, reacting to posts) not just accepting and forgetting. We want to grow the social graph in ways that make the platform more valuable to users.
There's a critical balance here. We could maximize connection formation by showing everyone to everyone, but that would include many low-quality suggestions that annoy users. We could maximize precision by only suggesting obvious connections, but that would miss growth opportunities. The optimal point is suggestions that are relevant, safe, and lead to engaged relationships.
Platform health considerations matter too. We don't want to enable stalking, harassment, or unwanted contact. Some potential connections should never be suggested even if the model predicts high accept probability. We need safety filters and policy constraints.
The growth aspect is important. New users need to quickly build their initial network or they churn. Power users need fresh suggestions to keep growing their network. The system needs to work for users at all stages of network maturity.
ML Objective
From an ML perspective, this is a link prediction problem at massive scale. Given a user and the billions of other users on the platform, we need to rank people by the probability that this user will send them a friend request and that request will be accepted.
But there are complications. It's not just binary prediction; we care about downstream engagement. A connection that's accepted but never interacted with is lower quality than one that leads to regular messaging. We need to predict not just P(accept) but P(engaged_connection).
We also need to handle asymmetry. User A might want to connect with user B, but B might not want to connect with A. Traditional link prediction assumes symmetric edges, but in social networks, one-sided interest is common. We need to model both directions.
The ranking is personalized per user and needs to update as their network evolves. When you accept a suggestion, your network changes, which affects future suggestions through second-order effects (friends-of-friends).
Unlock Full Solution
Get access to the complete walkthrough, key concepts, summary, and follow-up questions.