rLLM Project

rLLM Project — Blog https://rllm-project.com/blog.html Research, releases, and updates from the rLLM Project — building the infrastructure to train, evaluate, and evolve intelligent agents. en-us Fri, 12 Jun 2026 00:10:01 GMT Wed, 18 Mar 2026 00:00:00 GMT Hive: Collaborative Agent Evolution Platform https://hive.rllm-project.com https://hive.rllm-project.com Wed, 18 Mar 2026 00:00:00 GMT Platform The rLLM Team Hive is a collaborative platform for evolving and improving agents together. A swarm of agents iterate on shared tasks, learning from each other to push past what any single agent can reach alone. rLLM UI: Real-Time Observability for Agent Training & Evaluation https://rllm-project.com/post.html?post=rllm_ui.md https://rllm-project.com/post.html?post=rllm_ui.md Mon, 16 Mar 2026 00:00:00 GMT Release Chanbin Park and the rLLM Team A real-time observability platform for training and evaluating agents. Other tools show what is happening during training — rLLM UI shows you why, letting you inspect exactly what the model generates at every step. On-Policy Distillation: Training Smaller Students from Stronger Teachers https://rllm-project.com/post.html?post=opd.md https://rllm-project.com/post.html?post=opd.md Fri, 06 Mar 2026 00:00:00 GMT Research Brian Chen, Kyle Montgomery, and the rLLM Team rLLM On-Policy Distillation (OPD) trains smaller students from stronger teachers by using the teacher's policy to guide the student's training — a practical recipe for compact, capable models. Faster and Better: Open-Source Recipe for Deep Research Agents https://rllm-project.com/post.html?post=deepresearch.md https://rllm-project.com/post.html?post=deepresearch.md Thu, 19 Feb 2026 00:00:00 GMT Research rLLM Team We achieve 5× faster training (1 day vs 5 days) for deep research agents with rLLM's fully asynchronous architecture, and push accuracy from 30% to 36% on BrowseComp-Plus with a simple test-time document cutoff. rLLM-FinQA: A 4B Model that Outperforms 235B and Rivals Gemini 2.5 Pro https://rllm-project.com/post.html?post=finqa.md https://rllm-project.com/post.html?post=finqa.md Wed, 18 Feb 2026 00:00:00 GMT Research Manan Roongta, Sijun Tan, Bhavishya Pohani, Charles Dickens, Christopher Glaze In a collaboration with Snorkel AI, a domain-specialized 4B model outperforms Qwen3-235B (59.7% vs 51.4%) and performs comparably to Gemini 2.5 Pro (60.6%) on an expert-curated agentic financial benchmark. rLLM SDK: Training Any Agentic Program without Code Changes https://rllm-project.com/post.html?post=sdk.md https://rllm-project.com/post.html?post=sdk.md Wed, 10 Dec 2025 00:00:00 GMT Release Tianhao Wu, Sijun Tan, and the rLLM team The rLLM SDK intercepts LLM calls directly, letting you train any agent framework — LangChain, LangGraph, AutoGen, or custom code — without rewriting for training. What's trainable = what's practical to build. rLLM v0.2: RL Training over General Agentic Programs https://rllm-project.com/post.html?post=rllm_v0.2.md https://rllm-project.com/post.html?post=rllm_v0.2.md Thu, 16 Oct 2025 00:00:00 GMT Release Sijun Tan, Kyle Montgomery, and the rLLM team A major upgrade introducing AgentWorkflowEngine and AgentWorkflowTrainer — general abstractions that let you define multi-agent systems and complex workflows, and train them with RL without rewriting production code. Pepper: An Event-Driven Architecture for Proactive Agentic Systems https://rllm-project.com/post.html?post=pepper.md https://rllm-project.com/post.html?post=pepper.md Thu, 02 Oct 2025 00:00:00 GMT Research Tianhao Wu, Sijun Tan Pepper is a real-time, event-driven architecture enabling proactive agentic systems. Our personal assistant proactively fetches and summarizes emails and provides context before you even start a conversation. rLLM: Reinforcement Learning for Language Agents https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31 https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31 Tue, 01 Jul 2025 00:00:00 GMT Release Sijun Tan, Michael Luo, Colin Cai We release rLLM, an open-source framework for post-training language agents via reinforcement learning. Build custom agents and environments, train them with RL, and deploy them for real-world workloads.