How Pinterest Built a Real‑Time Radar for Violative Content using AI
Pinterest uses a metric called prevalence to measure policy-violating content, defined as the percentage of all views that went to harmful content. Prevalence complements user reports by identifying under-reported harms and tracking trends. Historically, reliance on human review for measuring prevalence was slow and expensive. To address this, Pinterest developed an AI-assisted workflow for daily prevalence measurement. This involves sampling user impressions and using a multimodal LLM for large-scale labeling. The LLM, guided by expert prompts and subject matter experts, significantly reduces latency and cost while maintaining accuracy. Prevalence is calculated daily, with confidence intervals, and can be broken down by policy areas, sub-policies, and content surfaces. The system uses risk scores from enforcement models for efficient sampling, but these scores do not act as labels. Inverse-probability weighting ensures the prevalence statistic accurately reflects user impressions over time, even with enforcement threshold changes. Machine learning is crucial for unbiased sampling and efficient labeling, allowing for faster risk detection and proactive responses. This data-driven approach enables quicker product iterations, informed policy development, and strategic decision-making, including setting goals and allocating resources effectively. Challenges like wide confidence intervals for rare categories or policy drift are managed through adaptive sampling and continuous monitoring. Future plans include expanding pivoting capabilities, optimizing LLM usage, and refining human-in-the-loop processes for enhanced accuracy and reduced bias.