SafetyKit’s blueprint for scaling risk agents with OpenAI’s most capable models

Oct 1, 2025
3 min read
SafetyKit’s blueprint for scaling risk agents with OpenAI’s most capable models

SafetyKit(opens in a new window) builds multimodal AI agents to help marketplaces, payment platforms, and fintechs detect and act on fraud and prohibited activity across text, images, financial transactions, product listings, and more. Recent breakthroughs in model reasoning and multimodal understanding now make this more effective, setting a new bar for risk, compliance, and safety operations.

SafetyKit’s agents leverage GPT‑5, GPT‑4.1, deep research, and Computer Using Agent (CUA) to review 100% of customer content with over 95% accuracy based on SafetyKit’s evals. They can help platforms protect users, prevent fraud, avoid regulatory fines and enforce complex policies that legacy systems may miss, such as region-specific rules, embedded phone numbers in scam images, or explicit content. Automation can also protect human moderators from exposure to offensive material and frees them to handle nuanced policy decisions.

“OpenAI gives us access to the most advanced reasoning and multimodal models on the market. It lets us adapt quickly, ship new agents faster, and handle content types other solutions can’t even parse.”

David Graunke, Founder and CEO of SafetyKit

Design agents for what the task demands, then choose the right model

SafetyKit’s agents are each built to handle a specific risk category, from scams to illegal products. Every piece of content is routed to the agent best suited for that violation, using the optimal OpenAI model:

  • GPT‑5 applies multimodal reasoning across text, images, and UI to surface hidden risks and support layered, precise decision-making
  • GPT‑4.1 reliably follows detailed content-policy instructions and efficiently manages high-volume moderation workflows
  • Reinforcement fine-tuning (RFT) boosts recall and precision beyond default models, achieving frontier performance with complex safety policies
  • Deep research integrates real-time online investigation into merchant reviews and verifications
  • Computer Using Agent (CUA) automates complex policy tasks, reducing reliance on costly manual reviews

This model-matching approach lets SafetyKit scale content review across modalities with more nuance and accuracy than legacy solutions can.

The Scam Detection agent, for example, goes beyond just scanning text. It analyzes visuals like QR codes or phone numbers embedded in product images. GPT‑4.1 helps it parse the image, understand the layout, and decide whether it is a policy violation.

The Policy Disclosure agent checks listings or landing pages for required language, such as legal disclaimers or region-specific compliance warnings. GPT‑4.1 extracts relevant sections, GPT‑5 evaluates compliance, and the agent flags violations.

“We think of our agents as purpose-built workflows,” says Graunke. “Some tasks require deep reasoning, others need multimodal context. OpenAI is the only stack that delivers reliable performance across both.”

Line and bar chart labeled “SafetyKit” on a light background, displaying data trends and comparisons across multiple categories.

Leverage GPT‑5 to navigate the gray areas and high-stakes decisions

Policy decisions often hinge on subtle distinctions. Take a marketplace requiring sellers to include a disclaimer for wellness products, with requirements varying based on product claims and regional rules. Legacy providers use keyword triggers or rigid rulesets, which can miss the deeper judgment calls these decisions may require, leading to missed or incorrect enforcement.

SafetyKit’s Policy Disclosure agent first references policies from SafetyKit’s internal library then GPT‑5 evaluates the content: does it mention treatment or prevention? Is it being sold in a region where disclosure is mandatory? And if so, is the required language actually included in the listing? If anything falls short, GPT‑5 returns a structured output the agent uses to flag the issue.

“The power of GPT‑5 is in how precisely it can reason when grounded in real policy,” notes Graunke. “It lets us make accurate, defensible decisions even in the edge cases where other systems fail.”