Back to Blog
AI Agent DesignResearch
Feb 20, 2026 12 min read

Your AI Agent's System Prompt Is Doing Nothing: The Research Behind Soul Design

Research shows generic role labels produce zero improvement in LLM performance. Here's how detailed "soul" descriptions transform AI agent quality — with real before/after examples from our team.

See Soul Design in Action

Watch our 6-agent team running with research-backed soul descriptions

View Agents →

"You are a helpful assistant."

If that's your AI agent's system prompt — or something equally vague — I have bad news. According to peer-reviewed research, it's doing literally nothing.

I saw a post from @tolibear_ breaking down multiple papers on this topic, and it confirmed everything I'd learned the hard way building our 6-agent team. The research is clear: how you define your agent's identity matters more than almost any other prompt engineering technique.

Let me walk you through the research, then show you exactly how we apply it.

Finding #1: Generic Roles Do Nothing

A 2023 study (arxiv.org/abs/2311.10054) tested 162 different role labels across 4 LLM families on 2,410 questions. The result?

Zero statistically significant improvement.

Generic labels like "You are an expert programmer" or "You are a helpful assistant" produced no measurable performance gain over having no role at all.

Think about that. Most of the system prompts being deployed right now — in production, serving real customers — are doing nothing. The label "expert" doesn't make the model act like an expert. It's like putting a "CEO" nameplate on an intern's desk and expecting them to run the company.

Finding #2: Detailed Identities Activate Deeper Knowledge

Here's where it gets interesting. The ExpertPrompting paper (arxiv.org/abs/2305.14688) found that detailed expert identities with specific backgrounds, skills, and experience significantly outperform generic labels.

The theory: LLMs encode knowledge in clusters associated with specific expertise patterns. When you describe a detailed professional identity — years of experience, specific tools used, types of problems solved — you're essentially providing a key that unlocks a more relevant cluster of knowledge.

Vague prompt → vague knowledge retrieval. Specific identity → targeted knowledge activation.

Finding #3: LLM-Generated Souls Beat Human-Written Ones

This one surprised me. The same ExpertPrompting research showed that when you ask an LLM to generate a detailed expert identity for a given task, it outperforms human-crafted personas.

Why? The LLM knows what expertise patterns exist in its training data better than we do. When it writes a soul description, it naturally includes the specific details that map to high-quality knowledge clusters.

The practical implication: use your AI to help design its own soul. Give it the role requirements and ask it to generate a detailed professional background. Then refine from there.

Finding #4: Multi-Soul Review Catches What Single Agents Miss

An EMNLP 2024 paper (arxiv.org/abs/2411.00492) demonstrated that having multiple agents with different expert identities review the same output improves truthfulness by 8.69%.

This is exactly why our team has 6 agents instead of 1. When Sage (analytics) reviews a claim that Nova (sales) makes in an email, they bring a completely different lens. The data person catches the overclaim the sales person wouldn't.

Different souls = different blind spots = fewer errors.

Real Before/After: Our Agent Souls

Here's what this looks like in practice. These are real transformations from our team.

Sage — VP Analytics

❌ Before

"You are Sage, VP Analytics. Analytical and precise. Let data speak."

✅ After

"You are a data analytics director with 13 years in business intelligence, having built analytics programs for companies ranging from Series A startups to Fortune 500 enterprises. Your expertise spans Python (pandas, numpy, scipy), SQL (complex window functions, CTEs, query optimization), and visualization tools (Power BI, Tableau, custom dashboards). You've designed KPI frameworks that directly influenced C-suite decisions, built real-time dashboards tracking $50M+ revenue pipelines, and mentored junior analysts on turning raw data into executive narratives. You default to statistical rigor — no insight without confidence intervals, no recommendation without supporting data."

Nova — VP Sales

❌ Before

"You are Nova, VP Sales. Warm but professional."

✅ After

"You are a B2B sales leader with 14 years closing deals in professional services and technology consulting, specializing in $1K–$10K deal sizes for SMB and mid-market. You've personally built outbound pipelines that generated 200+ qualified meetings per quarter using trigger-based cold email, LinkedIn social selling, and warm referral systems. You understand that at this deal size, speed and trust matter more than enterprise-style multi-threading. Your follow-up sequences average a 34% reply rate because you lead with insight, not pitch. You've trained 20+ SDRs and know exactly which objections kill deals at each stage."

See the difference? The first versions could be any chatbot. The second versions activate specific knowledge about actual tools, methodologies, metrics, and professional patterns. The model doesn't just know it's "analytics" — it knows it's the kind of analytics person who builds window functions and presents to C-suite.

The Soul Design Framework

Based on this research, here's the framework I use for every agent soul:

5 Elements of an Effective Soul

  1. 1. Specific experience — Years, company types, scale ("13 years in BI, Series A to Fortune 500")
  2. 2. Named tools and methods — Not "data tools" but "pandas, scipy, complex CTEs"
  3. 3. Quantified achievements — "$50M+ pipelines," "200+ meetings/quarter," "34% reply rate"
  4. 4. Decision-making patterns — "No insight without confidence intervals"
  5. 5. Professional context — Deal sizes, client types, team dynamics

The key insight: you're not writing a job description. You're writing a professional biography. Job descriptions are generic. Biographies are specific. The model responds to specificity.

How to Generate Better Souls (Using Your AI)

Remember finding #3 — LLM-generated souls outperform human ones. Here's my actual process:

Prompt to your LLM:

"I'm building an AI agent that will serve as [ROLE] for a 
[BUSINESS TYPE]. The agent will primarily [KEY TASKS].

Generate a detailed professional biography for this agent. 
Include:
- Specific years of experience and career trajectory
- Named tools, frameworks, and methodologies they've mastered
- Quantified achievements with real numbers
- Decision-making principles they follow
- The types of companies/clients they've worked with

Write it as a system prompt, second person ('You are...'). 
Make it specific enough that someone reading it would know 
exactly what kind of professional this person is."

Then iterate. The first output is good. Ask it to make the tools more specific, add industry context, sharpen the decision-making principles. Two rounds of refinement usually gets you something excellent.

Why Multi-Agent Makes This Even More Powerful

When you combine detailed souls with a multi-agent architecture, you get compounding benefits:

  • Diverse expertise activation — Each agent taps into different knowledge clusters
  • Natural quality control — A sales agent and an analytics agent reviewing the same proposal will catch different issues
  • Realistic collaboration — Just like a real exec team, different perspectives produce better outcomes

Our team — Claude (COO), Pixel (VP Content), Nova (VP Sales), Forge (VP Engineering), Sage (VP Analytics), and Archer (VP Strategy) — each has a soul description running 300-500 words. That's not bloat. That's the research-backed minimum for activating the right knowledge.

Common Mistakes to Avoid

  • ❌ Personality without expertise. "Friendly and professional" activates nothing. "14 years closing $1K–$10K B2B deals" activates everything.
  • ❌ Generic superlatives. "World-class data scientist" is meaningless. "Built analytics programs for Series A startups to Fortune 500" is specific.
  • ❌ Role labels as souls. "You are VP Analytics" is a title, not a soul. The research is clear: titles alone do nothing.
  • ❌ One soul for all tasks. Different tasks need different expertise. That's literally why multi-agent works better — the EMNLP paper proved it.

The Bottom Line

The research is unambiguous:

  • Generic role labels = zero improvement (162 roles tested, no significant gain)
  • Detailed expert identities = significant performance boost
  • LLM-generated souls outperform human-written ones
  • Multiple expert perspectives improve truthfulness by 8.69%

If you're building AI agents for yourself or for clients, soul design isn't a nice-to-have. It's the difference between a chatbot that says "I'd be happy to help!" and an agent that actually performs like the expert you need.

Stop writing job titles. Start writing professional biographies.

Get Agents With Research-Backed Soul Design

Vince Lauro

Building AI agents for executives. Follow me on X/Twitter