Hi HN! I built Autocache, an intelligent proxy for the Anthropic Claude API that automatically reduces costs by up to 90% and latency by up to 85%.
**The Impact:**
If you're spending $100/day on Claude API calls with system prompts and tools, Autocache can reduce that to ~$10/day with zero code changes. For a 1000-token system prompt reused across requests, you pay 1.25× once to cache it, then 0.1× on every
subsequent request.
**The Problem:**
Anthropic's Prompt Caching requires manually placing cache breakpoints in your API requests. For applications like n8n workflows, Flowise chatbots, or any complex integration with system prompts, tools, and conversation history, you either can't
access the request structure to optimize it, or doing so manually is extremely tedious.
**How Autocache Works:**
It's a transparent drop-in proxy. For each request, it:
1. Analyzes token counts across system prompts, tools, and message content
2. Calculates ROI scores for potential cache breakpoints (write costs vs. read savings)
3. Automatically injects cache-control fields at optimal positions
4. Returns X-Autocache-* headers showing projected savings and break-even points
**Perfect for:**
- n8n AI workflows (change base URL in Claude node)
- Flowise chatbots (configure HTTP endpoint)
- LangChain/LlamaIndex apps
- Custom Claude integrations
- Any app where you can't manually optimize prompts
**Try it in 30 seconds:**
```bash
docker run -d -p 8080:8080 -e ANTHROPIC_API_KEY=sk-ant-... ghcr.io/montevive/autocache:latest
Point your app to http://localhost:8080/v1/messages – check response headers for actual savings metrics on your workload.
GitHub: https://github.com/montevive/autocache
I've tested this with n8n workflows and seen $200→$25/day cost reductions on production workloads. The ROI algorithm uses conservative estimates, but I'd love feedback on edge cases or strategies I haven't considered.
Tech: Go, ~29MB Docker image, multi-arch, MIT licensed.
Good code design, regardless of paradigm used, encapsulates complexity in order to minimize the amount of context needed to understand any one bit of code. The better the encapsulation, the better we can scale our limited mental capacity to larger and more complex systems.
If the complexity is better encapsulated, it's also faster and easier for the reviewer to acquire the necessary context.
When I was first learning OOP, I wish someone had mentioned why classes are useful: they leverage our natural mental abstraction machinery for understanding the world. Walking down the road, we generally think of each car as a whole and ignore thinking about the thousands of parts that make it up. It would be overwhelming to simply cross a busy street if we were considering all of the millions of car components flowing down the streets or considering the various differences in cars all the time, rather than abstractly thinking of various differing collections of parts as just "car". All of this happens at a subconscious level, and we can consciously tap into some of this mental machinery to help us reason about code. Of course, we don't need OOP to leverage this mental machinery, but it's one technique for doing so.
I think the same. It is important that you can consult quickly, both for juniors and seniors. More than a sheet or two, it looks like a small reference manual.