More

jmrobles · 2025-10-08T16:39:19 1759941559

Hi HN! I built Autocache, an intelligent proxy for the Anthropic Claude API that automatically reduces costs by up to 90% and latency by up to 85%.

  **The Impact:**
  If you're spending $100/day on Claude API calls with system prompts and tools, Autocache can reduce that to ~$10/day with zero code changes. For a 1000-token system prompt reused across requests, you pay 1.25× once to cache it, then 0.1× on every
  subsequent request.

  **The Problem:**
  Anthropic's Prompt Caching requires manually placing cache breakpoints in your API requests. For applications like n8n workflows, Flowise chatbots, or any complex integration with system prompts, tools, and conversation history, you either can't
  access the request structure to optimize it, or doing so manually is extremely tedious.

  **How Autocache Works:**
  It's a transparent drop-in proxy. For each request, it:
  1. Analyzes token counts across system prompts, tools, and message content
  2. Calculates ROI scores for potential cache breakpoints (write costs vs. read savings)
  3. Automatically injects cache-control fields at optimal positions
  4. Returns X-Autocache-* headers showing projected savings and break-even points

  **Perfect for:**
  - n8n AI workflows (change base URL in Claude node)
  - Flowise chatbots (configure HTTP endpoint)
  - LangChain/LlamaIndex apps
  - Custom Claude integrations
  - Any app where you can't manually optimize prompts

  **Try it in 30 seconds:**
  ```bash
  docker run -d -p 8080:8080 -e ANTHROPIC_API_KEY=sk-ant-... ghcr.io/montevive/autocache:latest

  Point your app to http://localhost:8080/v1/messages – check response headers for actual savings metrics on your workload.

  GitHub: https://github.com/montevive/autocache

  I've tested this with n8n workflows and seen $200→$25/day cost reductions on production workloads. The ROI algorithm uses conservative estimates, but I'd love feedback on edge cases or strategies I haven't considered.

  Tech: Go, ~29MB Docker image, multi-arch, MIT licensed.

jmrobles · on Sept 15, 2024

"Fall in love with the problem and not the solution", Uri Levine (founder of Waze and Moovit)

praving5 · on Sept 16, 2024

jmrobles · on Nov 18, 2023

It sounds a new "Coca-cola" and "Pepsi" story again...

jmrobles · on Feb 25, 2023

No typing, no party.

...but I will include it under almost-low-code section

jmrobles · on Feb 18, 2023

The problem is even bigger when you are reviewing PRs, the lack of context is the big problem,imo.

KMag · on Feb 18, 2023

I agree.

Good code design, regardless of paradigm used, encapsulates complexity in order to minimize the amount of context needed to understand any one bit of code. The better the encapsulation, the better we can scale our limited mental capacity to larger and more complex systems.

If the complexity is better encapsulated, it's also faster and easier for the reviewer to acquire the necessary context.

When I was first learning OOP, I wish someone had mentioned why classes are useful: they leverage our natural mental abstraction machinery for understanding the world. Walking down the road, we generally think of each car as a whole and ignore thinking about the thousands of parts that make it up. It would be overwhelming to simply cross a busy street if we were considering all of the millions of car components flowing down the streets or considering the various differences in cars all the time, rather than abstractly thinking of various differing collections of parts as just "car". All of this happens at a subconscious level, and we can consciously tap into some of this mental machinery to help us reason about code. Of course, we don't need OOP to leverage this mental machinery, but it's one technique for doing so.

jmrobles · on May 21, 2022

IMHO it would be a vulnerability if you can "defeat" TLS with non-root user (scale priv attack). AFAIK you must be root to run tracing BPF.

As you say, quarkslab's tool (Peetch) can be a user-friendly alternative to Wireshark and SSL logs workaround.

jmrobles · on Feb 8, 2022

It's awesome and open source!

I made this video comparison between GoPro6 4k video vs the stabilized with gyroflow

https://youtu.be/KWiqF8Z8lEg

jmrobles · on Aug 10, 2020

This is the funny part:

"the [cloud] server does have strong security” --> oh, good

"and that users’ data have been encrypted by the MD5 algorithm --> WTF???

antihero · on Aug 10, 2020

How are these people even in 2020?

jmrobles · on April 22, 2019

Thanks brudgers.

This tutorial explains what it is and how it is used.

https://digitalilusion.com/news/become-an-expert-in-business...

jmrobles · on April 6, 2019

I think the same. It is important that you can consult quickly, both for juniors and seniors. More than a sheet or two, it looks like a small reference manual.