Summarize a research paper — 3-LLM consensus
Three LLMs independently summarise a paper, and the substrate picks the most-agreed-upon summary. CLI, TypeScript SDK, and Python SDK variants.
Problem
You want a reliable one-paragraph summary of a research paper, plus a 1-5 rating of methodology quality. A single LLM is good but occasionally hallucinates or anchors on the abstract. You'd rather fan out to three models and take the answer that most agree on.
Recipe
One query. Three LLM responders. consensus fold. min_quorum: 2 so the query doesn't block on a slow third responder.
Via CLI
spl infer "Summarise this paper in one paragraph and rate methodology 1-5: <paste abstract + methods>" \
--kind core.summary.v1 \
--responder llm:sonnet \
--responder llm:gpt-4o \
--responder llm:gemini-1.5 \
--fold consensus \
--min-quorum 2 \
--top-k 3 \
--budget 0.30 \
--timeout 60 \
--waitExpected output (text mode):
answer:
{
"summary": "The authors propose a transformer-based approach to...",
"rating": 4,
"confidence": 0.88
}
provenance (3 contributors)
trust_summary: mean 0.77 min 0.72 max 0.82 n=3
cost: $0.1840
fold: consensus
thread: th_9d4e2a...
know_record: 2f7c91b8...Via TypeScript SDK
import { Client } from "@syncropel/sdk";
const client = new Client({ baseUrl: "http://localhost:9100" });
const result = await client.infer({
input: {
goal: "Summarise this paper in one paragraph and rate methodology 1-5",
context: { abstract: "...", methods: "..." }
},
responders: [
{ kind: "llm", model: "~sonnet" },
{ kind: "llm", model: "~gpt-4o" },
{ kind: "llm", model: "~gemini-1.5" }
],
fold: { function: "consensus", min_quorum: 2 },
answer_shape: {
kind: "core.summary.v1",
required_fields: ["body.summary", "body.rating"]
},
relevance: { top_k: 3 },
side_effects: { max_cost_usd: 0.30, max_latency_secs: 60 }
});
console.log(result.answer);
console.log(`Trust: ${result.trust_summary.mean.toFixed(2)}`);
console.log(`Cost: $${result.cost_actual_usd.toFixed(4)}`);Via Python SDK
from syncropel import Client
client = Client(base_url="http://localhost:9100")
result = await client.infer({
"input": {
"goal": "Summarise this paper in one paragraph and rate methodology 1-5",
"context": {"abstract": "...", "methods": "..."}
},
"responders": [
{"kind": "llm", "model": "~sonnet"},
{"kind": "llm", "model": "~gpt-4o"},
{"kind": "llm", "model": "~gemini-1.5"},
],
"fold": {"function": "consensus", "min_quorum": 2},
"answer_shape": {
"kind": "core.summary.v1",
"required_fields": ["body.summary", "body.rating"],
},
"relevance": {"top_k": 3},
"side_effects": {"max_cost_usd": 0.30, "max_latency_secs": 60},
})
print(result.answer)
print(f"Trust: {result.trust_summary['mean']:.2f}")
print(f"Cost: ${result.cost_actual_usd:.4f}")How consensus picks the winner
Each response body is serialised to canonical JSON (keys sorted alphabetically, no whitespace). Bodies with identical canonical JSON group together. Group weights are the sum of each contributor's trust (clamped to a 0.05 floor). The highest-weight group wins.
If the three summaries are textually different but the rating is the same, consensus on the full body won't tie them — structural equality is exact. For text-heavy answers, ensemble_weighted with a similarity-based weight expression is usually better, or you can prompt the LLMs to produce a single structured rating and let consensus tie on that field alone.
The trade-off
Three LLMs at ~5-15s each cost 3× a single query. You're paying for variance reduction. If a single LLM's summary is reliably good enough for your use case, use single_shot with one responder; save the fan-out for cases where you've empirically seen divergent answers.
min_quorum: 2 keeps you from blocking on a single slow responder. If you raise it to 3, a single timeout kills the query — you'd see quorum_not_met instead of a KNOW.
Run this against a dev daemon
With three LLM providers configured in ~/.syncro/providers.yaml, the CLI above runs end-to-end. If you only have one provider, point the other two at the same provider with different model aliases or replace them with a pattern / system responder.
See also
- Fold Functions — consensus — how the tally is computed.
- Orchestration Patterns — single_shot — the default pattern used here.
- Cost-bounded query waterfall — cascade instead of fan-out.
Audit export to Elastic
Pipe Syncropel's security-relevant record stream into Elasticsearch or another SIEM. JSONL in, indexed and searchable out.
Translate with human verification
Claude Sonnet translates, GPT-4 verifies, Alice is the tiebreaker — a verify-pattern query that only pulls in the human when the two LLMs actually disagree.