The cost equation
Cost scales with the number of messages and how many tokens each carries. Messages = conversations × messages-per-conversation, and conversations = traffic × open rate. Each message's input includes the system prompt, the user's text, some conversation history and any retrieved knowledge.
Why long conversations cost more
Each new message resends prior turns as context, so a 12-message conversation costs far more than two 6-message ones. Designing the bot to resolve quickly is both better UX and cheaper.
RAG adds input tokens
Knowledge-base retrieval improves answers but adds retrieved chunks to every message's input, plus a small query-embedding cost. Retrieve only what you need and cache stable knowledge.
Model choice
Most support and FAQ bots work well on a small, fast model. Premium models are worth it only for complex reasoning or regulated domains. Compare models for your traffic in the chatbot cost calculator.
The business case is deflection
The token bill is usually small next to the labour you save by deflecting tickets from humans. Model that with the support ROI calculator.