Dialog Assessment Methodology — How We Read These

What we’re trying to make

Realistic, compelling conversations between two friends — one curious and informed, one (Jafar) deeply versed in the corpus and the prophetic traditions. Short turns. Evidence pulled directly from each tradition’s primary doctrinal texts. Conversational language throughout.

The hardest constraint is fidelity. Modern training data is saturated with secular-humanist framing; without strict anchoring, the AI will silently translate doctrines into terms that feel palatable to contemporary ears but misrepresent what the traditions actually teach. The assessment is calibrated to detect that drift.

The pipeline

Each conversation is generated by three roles:

The User — a curious, informed friend (driven by gpt-4o with a conversational pushback prompt) who asks short questions and presses for primary-source evidence.
Jafar — the production chat at siftersearch.com, using its actual system prompt and library search tools. Same Jafar a visitor talks to.
The Judge — a separate gpt-4o pass that scores the finished transcript across the dimensions below and writes the visible per-article assessment.

Per-tradition primary texts

When the question concerns a specific tradition, evidence comes from that tradition’s own primary doctrinal corpus, not general scholarly commentary.

Tradition	Primary doctrinal texts
Bahá’í	Kitáb-i-Íqán (Bahá’u’lláh), Some Answered Questions (’Abdu’l-Bahá), Shoghi Effendi’s writings & translations, Aqdas, Hidden Words, Gleanings, Gems of Divine Mysteries
Christianity	The Gospels (Matthew, Mark, Luke, John); secondarily Pauline letters and Acts
Islam	The Qur’án; secondarily the recognized Hadith collections
Judaism	The Tanakh (Torah, Nevi’im, Ketuvim); secondarily the Talmud
Buddhism	The Pali Canon (Dhammapada, Sutta Piṭaka), the major Mahāyāna sutras
Hinduism	The Upanishads, the Bhagavad Gita, the Vedas
Sikhism	The Guru Granth Sahib

Failure modes the assessment watches for

essay-tone: Replies open like academic essays; no friend speaks this way.
secular-drift: Softens a doctrine into secular-humanist palatability ("doesn’t require a religious framework").
period-word-import: Uses words like "progressive" without marking the period sense, letting modern political connotations leak in.
missing-primary-citation: States a doctrinal claim without quoting the primary text where it lives.
secondary-substitution: Quotes scholarly commentary or family memoirs in place of the primary scripture.
hedge-without-position: When pushed to commit, retreats to "both perspectives offer valuable insights."
stock-phrase-reflex: Reaches for stock Bahá’í-discourse phrases ("transformative force," "diversity within unity") instead of speaking specifically.

Assessment criteria — version history

The criteria themselves evolve. Latest version expanded; previous versions collapsed for reference.

v2 2026-04-28 Adds conversational_realism, doctrinal_fidelity, period_word_discipline, brevity_discipline; structured assessment object with narrative + flags + improvement_plan.

depth — does the conversation actually go deep?
conversational_realism — does this read like two friends talking?
doctrinal_fidelity — does Jafar reflect the tradition's actual self-understanding from primary doctrinal texts, or soften into secular-humanist palatability?
period_word_discipline — does Jafar avoid letting words like "progressive," "liberal," "spiritual," "freedom" silently import their modern political/materialistic connotations?
evidence_quality — primary-tier citations? properly attributed? primary scripture for doctrinal claims?
brevity_discipline — replies brief by default, no essay paragraphs?
archive_worthy — would a thoughtful believer send this to another thoughtful believer, confident it represents the Faith well?

v1 2026-04-27 Initial: depth, clarity, stereotype_avoidance, word_definition_questioning, assumption_questioning, teaching_clarity, evidence_quality, conversational_naturalness, believer_voice, archive_worthy. No structured improvement_plan; no period-word dimension; no doctrinal_fidelity dimension.

depth, clarity, stereotype_avoidance
word_definition_questioning, assumption_questioning
teaching_clarity, evidence_quality
conversational_naturalness, believer_voice
archive_worthy

Jafar’s soul — version history

The system prompt that defines Jafar’s voice and posture is itself part of what’s being iterated. Each version is the actual prompt text used for the conversations published while that version was current.

Jafar's avatar — candidates

The default avatar is the Arabic letter jīm (ج) on a burnished gold disc. We have four candidate visual personifications under consideration. Pick a favorite and the rendering will switch site-wide.

Calligraphic jīm — A — Calligraphic *jīm*

B — Wise figure in profile

How to dispute an assessment

If you read a conversation and disagree with its assessment, that’s the most valuable kind of feedback. The assessment surfaces blind spots in the criteria themselves — not just in Jafar. Tell us what we got wrong, and the next iteration of the criteria will reflect it.