How These Conversations Are Assessed
This is beta. Every published dialog carries a visible assessment so you can read the conversation and the assessment side-by-side, and decide whether the assessment is right. If you find a blind spot, that's the most useful kind of feedback — it iterates the criteria themselves, not just Jafar.
What we’re trying to make
Realistic, compelling conversations between two friends — one curious and informed, one (Jafar) deeply versed in the corpus and the prophetic traditions. Short turns. Evidence pulled directly from each tradition’s primary doctrinal texts. Conversational language throughout.
The hardest constraint is fidelity. Modern training data is saturated with secular-humanist framing; without strict anchoring, the AI will silently translate doctrines into terms that feel palatable to contemporary ears but misrepresent what the traditions actually teach. The assessment is calibrated to detect that drift.
The pipeline
Each conversation is generated by three roles:
- The User — a curious, informed friend (driven by gpt-4o with a conversational pushback prompt) who asks short questions and presses for primary-source evidence.
- Jafar — the production chat at siftersearch.com, using its actual system prompt and library search tools. Same Jafar a visitor talks to.
- The Judge — a separate gpt-4o pass that scores the finished transcript across the dimensions below and writes the visible per-article assessment.
Per-tradition primary texts
When the question concerns a specific tradition, evidence comes from that tradition’s own primary doctrinal corpus, not general scholarly commentary.
| Tradition | Primary doctrinal texts |
|---|---|
| Bahá’í | Kitáb-i-Íqán (Bahá’u’lláh), Some Answered Questions (’Abdu’l-Bahá), Shoghi Effendi’s writings & translations, Aqdas, Hidden Words, Gleanings, Gems of Divine Mysteries |
| Christianity | The Gospels (Matthew, Mark, Luke, John); secondarily Pauline letters and Acts |
| Islam | The Qur’án; secondarily the recognized Hadith collections |
| Judaism | The Tanakh (Torah, Nevi’im, Ketuvim); secondarily the Talmud |
| Buddhism | The Pali Canon (Dhammapada, Sutta Piṭaka), the major Mahāyāna sutras |
| Hinduism | The Upanishads, the Bhagavad Gita, the Vedas |
| Sikhism | The Guru Granth Sahib |
Failure modes the assessment watches for
essay-tone- Replies open like academic essays; no friend speaks this way.
secular-drift- Softens a doctrine into secular-humanist palatability ("doesn’t require a religious framework").
period-word-import- Uses words like "progressive" without marking the period sense, letting modern political connotations leak in.
missing-primary-citation- States a doctrinal claim without quoting the primary text where it lives.
secondary-substitution- Quotes scholarly commentary or family memoirs in place of the primary scripture.
hedge-without-position- When pushed to commit, retreats to "both perspectives offer valuable insights."
stock-phrase-reflex- Reaches for stock Bahá’í-discourse phrases ("transformative force," "diversity within unity") instead of speaking specifically.
Assessment criteria — version history
The criteria themselves evolve. Latest version expanded; previous versions collapsed for reference.
v2 2026-04-28 Adds conversational_realism, doctrinal_fidelity, period_word_discipline, brevity_discipline; structured assessment object with narrative + flags + improvement_plan.
- depth — does the conversation actually go deep?
- conversational_realism — does this read like two friends talking?
- doctrinal_fidelity — does Jafar reflect the tradition's actual self-understanding from primary doctrinal texts, or soften into secular-humanist palatability?
- period_word_discipline — does Jafar avoid letting words like "progressive," "liberal," "spiritual," "freedom" silently import their modern political/materialistic connotations?
- evidence_quality — primary-tier citations? properly attributed? primary scripture for doctrinal claims?
- brevity_discipline — replies brief by default, no essay paragraphs?
- archive_worthy — would a thoughtful believer send this to another thoughtful believer, confident it represents the Faith well?
v1 2026-04-27 Initial: depth, clarity, stereotype_avoidance, word_definition_questioning, assumption_questioning, teaching_clarity, evidence_quality, conversational_naturalness, believer_voice, archive_worthy. No structured improvement_plan; no period-word dimension; no doctrinal_fidelity dimension.
- depth, clarity, stereotype_avoidance
- word_definition_questioning, assumption_questioning
- teaching_clarity, evidence_quality
- conversational_naturalness, believer_voice
- archive_worthy
Jafar’s soul — version history
The system prompt that defines Jafar’s voice and posture is itself part of what’s being iterated. Each version is the actual prompt text used for the conversations published while that version was current.
Jafar's avatar — candidates
The default avatar is the Arabic letter jīm (ج) on a burnished gold disc. We have four candidate visual personifications under consideration. Pick a favorite and the rendering will switch site-wide.
How to dispute an assessment
If you read a conversation and disagree with its assessment, that’s the most valuable kind of feedback. The assessment surfaces blind spots in the criteria themselves — not just in Jafar. Tell us what we got wrong, and the next iteration of the criteria will reflect it.