Hacker Newsnew | past | comments | ask | show | jobs | submit | jedwhite's commentslogin

That's an interesting insight about "stacking tricks" together. I'm curious where you found that approach hit limits. And what gives you an advantage if anything against others copying it. Getting real-time streaming with a 20B parameter diffusion model and 20fps on a single GPU seems objectively impressive. It's hard to resist just saying "wow" looking at the demo, but I know that's not helpful here. It is clearly a substantial technical achievement and I'm sure lots of other folks here would be interested in the limits with the approach and how generalizable it is.


Good question! Software gets democratized so fast that I am sure others will implement similar approaches soon. And, to be clear, some of our "speed upgrades" are pieced together from recent DiT papers. I do think getting everything running on a single GPU at this resolution and speed is totally new (as far as i have seen).

I think people will just copy it, and we just need to continue moving as fast as we can. I do think that a bit of a revolution is happening right now in real-time video diffusion models. There are so many great papers being published in that area in the last 6 months. My guess is that many DiT models will be real time within 1 year.


> I do think getting everything running on a single GPU at this resolution and speed is totally new

Thanks, it seemed to be the case that this was really something new, but HN tends to be circumspect so wanted to check. It's an interesting space and I try to stay current but everything is moving so fast. But I was pretty sure I hadn't seen anyone do that. Its a huge achievement to do it first and make it work for real like this! So well done!


One thing that is interesting: LLMs pipelines have been highly optimize for speed (since speed is directly related to cost for companies). That is just not true for real-time DiTs. So, there is still lots of low hanging fruit for how we (and others) can make things faster and better.


Curious about the memory bandwidth constraints here. 20B parameters at 20fps seems like it would saturate the bandwidth of a single GPU unless you are running int4. I assume this requires an H100?


Yep, the model is running on Hopper architecture. Anything less was not sufficient in our experiments.


Thanks for posting the original ideas that led to all this. "Runtime for prose" is the new "literate programming" - early days but a pointer to some pretty cool future things, I think.

It's already made a bunch of tasks that used to be time-consuming to automate much easier for me. I'm still learning where it does and doesn't work well. But it's early days.

You can tell something is a genuinely interesting new idea when someone posts about it on X and then:

1. There are multiple launches on HN based on the idea within a week, including this one.

2. It inspires a lot of discussion on X, here and elsewhere - including many polarized and negative takes.

Hats off for starting a (small but pretty interesting) movement.


I shared a repo on HN last week that lets you use remote execution with these kinds of script files autonomously - if you want to. It had some interesting negative and positive discussion.

The post mentioned Pete Koomen's install.md idea as an example use case. So now with this launch you can try it with a real intstallation script!

I think it's a really interesting idea worth experimentation and exploration. So it's a positive thing to see Mintlify launch this, and that it's already on Firecrawl.dev's docs!

We can all learn from it.

Show HN discussion of executable markdown here:

https://news.ycombinator.com/item?id=46549444

The claude-run tool lets you execute files like this autonomously if you want to experiment with it.

    curl -fsSL https://docs.firecrawl.dev/install.md | claude-run --permission-mode bypassPermissions
Github repo:

https://github.com/andisearch/claude-switcher

This is still a very early-stage idea, but I'm really stoked to see this today. For anyone interested in experimenting with it, it's a good idea to try in a sandboxed environment.


We made some improvements to support remote markdown script execution and piping.

Run scripts from the web:

    ```bash
    
    curl -fsSL https://andisearch.github.io/ai-scripts/analyze.md | claude-run
    
    echo "Explain what a Makefile does" | claude-run         # Simple prompt
    
    ```
Shebang flags in the markdown (like --permission-mode bypassPermissions) are honored.

There is a new initiative installmd.org from Nick Khami at Mintlify to support experiments with this approach.

https://installmd.org


They're quite different, I think. Some advantages with claude-run include:

- You make standard Markdown files directly executable using a shebang line.

- No special Markdown formatting or syntax needed. The Markdown itself is clean and standard, rather than using variable placeholders or any kind of special syntax.

- Regular filenames work: no special filename format needed. It just works like regular shell scripts with flags and piping

- Works with any text file format and file extension (xml, yaml, .ag etc)

- Includes support for session isolation

- Keeps script use separate from your regular Claude Code subscription

- Allows you to specify the provider cloud / model in scripts, or switch them on the fly.

It is intended to be more unix-like in philosophy.


I know there are two polarized camps on the topic of AI coding. Even for people who are concerned about it and prefer to use traditional scripting, there are some benefits worth considering in having runnable, composable prompt modules.

There are some tasks that are challenging to achieve with traditional code, but where modern LLMs perform strongly.

Examples include summarization, complex content formatting and restructuring, natural language classification, and evaluation judgements.

I’ve found that it is useful to be able to easily incorporate these along with traditional Shell scripts and command line tools as part of workflow pipelines. And I hope it can be useful for other people too.


Certainly LLM AI is good for something but I don't think your shell script installer is the right place for it.

Imagine a world where you go to install some software and instead get goatse wall paper because 4chan poisoned the llm.

Imagine a world where you need a $200/month Claude XP Pro subscription to download and install a tarball.


At the moment, it looks like Claude Code does not support using ‘temperature’ or ‘seed’ flags. It would be awesome if they add that.

Using the request to use a seed within the prompt will mean that when Claude rights the code it could use that seed inside what it writes for randomize functions. But sadly it wouldn’t impact Claude’s own text generation’s determinism.

There is active interest on GitHub to support this. But the most recent issue with it I could see was closed in July as “not planned”


Thank you, and yes! That is what I already frequently do for quick automation tasks.

As you say, Claude is actually very good at writing shell scripts and using tools on-the-fly. But I know there is an AI-confidence factor involved for developers making the choice to leverage that.

For simple tasks (in practice) I already find you can often prompt the whole thing.

For tasks where you already have the other traditional scripts or building blocks, or where it is complex, then you might break it up.

Interestingly, you can intermix these approaches.

You can have runnable markdown that writes and runs scripts on the fly, mixed with running command line tools, and chained along with traditional tools in a bash script, and then call that script from a runnable markdown that passes in test results, or analyzes the code base and passes recommendations in.

The composability and ability to combine and embed code blocks and tool use within plain language is quite powerful. I’m still learning how to use this.

I’m glad it is already useful and thank you.


Also +1 to using containers and sandboxed environments! It means you can yolo it and skip permissions dangerously to experiment with vibe automation :)

More seriously, I agree that setting permissions to the minimum needed for the task and using sandboxed containers is sensible.


There are some tasks that LLMs are good at, but which can be hard to do with traditional command line tools or scripts. This is true even when you are a skilled coder and expert in Shell scripting. Examples include summarization, judgement-based evaluation, formatting etc.

Executable markdown provides a method of building these tasks into traditional pipelines as small, single-task-focused, composable modules. They also have the advantage that they can be easily shared and re-used.


I’m still not clear on why a shell script with comments (shell being a very clear way to explain commands to executive) is not as good as a text file which then has some poor man’s shell superimposed on it.


Thanks, it’s great to see people trying different approaches to runnable prompts and variations on literate programming. I think it’s an area with a lot of potential, and I expect there will be a lot of interesting ideas come out of it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: