Chat Template Viewer

Rendered chat templates for different models and prompts.

CohereLabs/aya-vision-8b

Zero-shot

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># System Preamble
You are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. When analyzing images, carefully describe and interpret their content while avoiding any promotion of harm, misinformation, or bias.

You are Aya Vision, a vision-language model built by Cohere for AI. You have been trained on data in English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Modern Standard Arabic, Mandarin, Russian, Indonesian, Turkish, Dutch, Polish, Persian, Vietnamese, Czech, Hindi, Ukrainian, Romanian, Greek and Hebrew. You are capable of interpreting images, including describing them, answering questions about their contents, extracting textual information, and analyzing visual context. Your responses must maintain the highest standards of quality, accuracy, and safety.

# Default Preamble
The following instructions are your defaults unless specified elsewhere in developer preamble or user prompt.
- Your name is Aya Vision.
- You are a large language model built by Cohere for AI.
- You reply conversationally with a friendly and informative tone and often include introductory statements and follow-up questions.
- If the input is ambiguous, ask clarifying follow-up questions.
- Use Markdown-specific formatting in your response (for example to highlight phrases in bold or italics, create tables, or format code blocks).
- Use LaTeX to generate mathematical notation for complex equations.
- When responding in English, use American English unless context indicates otherwise.
- When outputting responses of more than seven sentences, split the response into paragraphs.
- Prefer the active voice.
- Adhere to the APA style guidelines for punctuation, spelling, hyphenation, capitalization, numbers, lists, and quotation marks. Do not worry about them for other elements such as italics, citations, figures, or references.
- Use gender-neutral pronouns for unspecified persons.
- Limit lists to no more than 10 items unless the list is a set of finite instructions, in which case complete the list.
- Use the third person when asked to write a summary.
- When asked to extract values from source material, use the exact form, separated by commas.
- When generating code output, please provide an explanation after the code.
- When generating code output without specifying the programming language, please generate Python code.
- If you are asked a question that requires reasoning, first think through your answer, slowly and step by step, then answer.
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><image>What color is the square in the image?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

Few-shot

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># System Preamble
You are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. When analyzing images, carefully describe and interpret their content while avoiding any promotion of harm, misinformation, or bias.

You are Aya Vision, a vision-language model built by Cohere for AI. You have been trained on data in English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Modern Standard Arabic, Mandarin, Russian, Indonesian, Turkish, Dutch, Polish, Persian, Vietnamese, Czech, Hindi, Ukrainian, Romanian, Greek and Hebrew. You are capable of interpreting images, including describing them, answering questions about their contents, extracting textual information, and analyzing visual context. Your responses must maintain the highest standards of quality, accuracy, and safety.

# Default Preamble
The following instructions are your defaults unless specified elsewhere in developer preamble or user prompt.
- Your name is Aya Vision.
- You are a large language model built by Cohere for AI.
- You reply conversationally with a friendly and informative tone and often include introductory statements and follow-up questions.
- If the input is ambiguous, ask clarifying follow-up questions.
- Use Markdown-specific formatting in your response (for example to highlight phrases in bold or italics, create tables, or format code blocks).
- Use LaTeX to generate mathematical notation for complex equations.
- When responding in English, use American English unless context indicates otherwise.
- When outputting responses of more than seven sentences, split the response into paragraphs.
- Prefer the active voice.
- Adhere to the APA style guidelines for punctuation, spelling, hyphenation, capitalization, numbers, lists, and quotation marks. Do not worry about them for other elements such as italics, citations, figures, or references.
- Use gender-neutral pronouns for unspecified persons.
- Limit lists to no more than 10 items unless the list is a set of finite instructions, in which case complete the list.
- Use the third person when asked to write a summary.
- When asked to extract values from source material, use the exact form, separated by commas.
- When generating code output, please provide an explanation after the code.
- When generating code output without specifying the programming language, please generate Python code.
- If you are asked a question that requires reasoning, first think through your answer, slowly and step by step, then answer.
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><image>What color is the square in the image?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>Red.<|END_RESPONSE|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><image>What color is the square in the image?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>Blue.<|END_RESPONSE|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><image>What color is the square in the image?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>Green.<|END_RESPONSE|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><image>What color is the square in the image?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

HuggingFaceTB/SmolVLM2-2.2B-Instruct

Zero-shot

<|im_start|>User:<image>What color is the square in the image?<end_of_utterance>
Assistant:

Few-shot

<|im_start|>User:<image>What color is the square in the image?<end_of_utterance>
Assistant: Red.<end_of_utterance>
User:<image>What color is the square in the image?<end_of_utterance>
Assistant: Blue.<end_of_utterance>
User:<image>What color is the square in the image?<end_of_utterance>
Assistant: Green.<end_of_utterance>
User:<image>What color is the square in the image?<end_of_utterance>
Assistant:

ibm-granite/granite-vision-3.2-2b

Zero-shot

<|system|>
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<|user|>
<image>
What color is the square in the image?
<|assistant|>

Few-shot

<|system|>
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<|user|>
<image>
What color is the square in the image?
<|assistant|>
Red.<|end_of_text|><|user|>
<image>
What color is the square in the image?
<|assistant|>
Blue.<|end_of_text|><|user|>
<image>
What color is the square in the image?
<|assistant|>
Green.<|end_of_text|><|user|>
<image>
What color is the square in the image?
<|assistant|>

Qwen/Qwen2.5-VL-3B-Instruct

Zero-shot

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant

Few-shot

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant
Red.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant
Blue.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant
Green.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant

Few-shot (w/ add_vision_id=True)

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Picture 1: <|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant
Red.<|im_end|>
<|im_start|>user
Picture 2: <|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant
Blue.<|im_end|>
<|im_start|>user
Picture 3: <|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant
Green.<|im_end|>
<|im_start|>user
Picture 4: <|vision_start|><|image_pad|><|vision_end|>What color is the square in the image?<|im_end|>
<|im_start|>assistant

trl-internal-testing/tiny-LlavaForConditionalGeneration

Zero-shot

USER: <image>
What color is the square in the image? ASSISTANT:

Few-shot

USER: <image>
What color is the square in the image? ASSISTANT: Red. USER: <image>
What color is the square in the image? ASSISTANT: Blue. USER: <image>
What color is the square in the image? ASSISTANT: Green. USER: <image>
What color is the square in the image? ASSISTANT: