TL;DR: In Flowise's ChatOpenAI or ChatAnthropic nodes, set the Base Path to
https://tokonomics.ca/proxy/openaiand use your Tokonomics API key. Every chain execution is metered — cost per chatflow, per model, per day. No code changes to your flows.
The Cost Blind Spot in Flowise
Flowise is a powerful open-source tool for building LLM applications visually. Drag a ChatOpenAI node, connect a prompt template, add a vector store — and you have a working AI chatbot.
But Flowise has no concept of cost. You can see how many times a chatflow ran. You cannot see that those runs consumed $280 in tokens, or that your RAG chain is sending 8,000 input tokens per query because the retriever pulls too many chunks.
AI agents in Flowise are especially dangerous for costs. An agent that calls tools, reasons over multiple steps, and retries on failure can consume 10-50x more tokens than a simple chat completion. Without cost tracking, a single agent chatflow can burn through $500 before anyone notices.
How the Integration Works
Flowise's LLM nodes (ChatOpenAI, ChatAnthropic) support a Base Path override. Set it to the Tokonomics proxy URL, and every LLM call routes through the proxy before reaching the provider.
Before: Flowise → api.openai.com → response
After: Flowise → tokonomics.ca/proxy/openai → api.openai.com → response
The proxy records tokens, cost, model, and latency for each call, then returns the response unchanged. Your chatflows work exactly as before.
Step-by-Step: ChatOpenAI Node
- Open your Flowise chatflow
- Click on the ChatOpenAI node
- Under Connect Credential, create a new OpenAI API credential:
- API Key:
mk_your_tokonomics_key
- API Key:
- In the node settings, find Base Path (under Additional Parameters)
- Set it to:
https://tokonomics.ca/proxy/openai - Save and test
Every call from this node is now metered. Check your Tokonomics dashboard to see the cost.
Step-by-Step: ChatAnthropic Node
- Click on the ChatAnthropic node
- Create a new Anthropic credential:
- API Key:
mk_your_tokonomics_key
- API Key:
- Set Base URL to:
https://tokonomics.ca/proxy/anthropic - Save and test
The same pattern works for any LLM node that supports a base URL override.
Tracking Agent Costs
Flowise agents (OpenAI Function Agent, ReAct Agent, Conversational Agent) make multiple LLM calls per user query. A single agent interaction might involve:
- 1 initial reasoning call (500 tokens)
- 2-3 tool calls (300 tokens each)
- 1 final synthesis call (800 tokens)
That's 4-5 LLM calls for one user message. At scale, agent costs compound fast.
With Tokonomics, each of these sub-calls is recorded individually. The dashboard shows total cost per interaction and helps you identify which agent chains are expensive and why.
Common agent cost issues:
- Too many tool calls — agent loops through 5+ tools when 2 would suffice
- Expensive reasoning model — using GPT-4o for tool selection when GPT-4o-mini handles it fine
- Long conversation history — every turn resends the full history, growing input tokens linearly
RAG Chain Cost Optimization
RAG (Retrieval-Augmented Generation) chains in Flowise are a common source of hidden costs. The cost comes from input tokens — the more chunks your retriever returns, the more tokens you send to the LLM.
| Chunks retrieved | Avg chunk size | Input tokens | Cost per query (GPT-4o) |
|---|---|---|---|
| 3 | 500 tokens | 1,500 | $0.00375 |
| 5 | 500 tokens | 2,500 | $0.00625 |
| 10 | 500 tokens | 5,000 | $0.0125 |
Reducing from 10 chunks to 5 cuts input cost by 50%. Most RAG applications get diminishing returns past 3-5 chunks. Use the Tokonomics dashboard to see your actual input token counts — if they're consistently high, reduce your retriever's topK parameter.
For a deep dive on prompt efficiency, see our cost optimization strategies guide.
Multi-Model Chatflows
Flowise makes it easy to chain multiple models. A common pattern:
- Classifier (GPT-4o-mini) — route the query to the right handler
- RAG chain (Claude Sonnet) — answer complex queries with context
- Summarizer (GPT-4o-mini) — condense the response
With Tokonomics, each model call is tracked separately. The dashboard breaks down cost by model, so you can see that Claude Sonnet accounts for 80% of your spend and optimize accordingly.
Self-Hosted Flowise: Same Integration
Flowise is typically self-hosted. The Tokonomics proxy is a URL — it doesn't matter where Flowise runs. As long as your server can make outbound HTTPS requests to tokonomics.ca, the integration works.
For Flowise running in Docker:
- No container changes needed
- No environment variables beyond what's configured in the UI
- The proxy URL is set per-node in the chatflow editor
Frequently Asked Questions
Does this work with Flowise's streaming mode?
Yes. The Tokonomics proxy supports streaming — chunks are forwarded as they arrive from the provider. The user experience is identical to direct streaming.
Can I track costs per chatflow?
Yes, by using different Tokonomics API keys per chatflow, or by adding custom headers via Flowise's HTTP configuration. Each API key gets its own usage breakdown in the dashboard.
What about LangChain nodes in Flowise?
Flowise wraps LangChain components. Any LangChain node that calls an LLM (ChatOpenAI, ChatAnthropic, etc.) can be routed through the proxy by setting the base URL. The proxy is transparent to LangChain — it sees a standard API endpoint.
How much latency does the proxy add?
Approximately 30ms per call (benchmark data). For LLM calls that take 500ms-3,000ms, this is unnoticeable.
Get Started
- Create a free Tokonomics account (100 calls/month free)
- Copy your API key
- Set the Base Path in your ChatOpenAI/ChatAnthropic nodes
- Test one chatflow — check the dashboard
- Set a budget alert to catch overspend early
All sources retrieved June 2026. Pricing: GPT-4o at $2.50/1M input tokens (OpenAI Pricing), Claude Sonnet 4 at $3.00/1M input tokens (Anthropic Pricing).