Hacker Newsnew | past | comments | ask | show | jobs | submit | nickandbro's commentslogin

Does well on SVGs outside of "pelican riding on a bicycle" test. Like this prompt:

"create a svg of a unicorn playing xbox"

https://www.svgviewer.dev/s/NeKACuHj

Still some tweaks to the final result, but I am guessing with the ARC-AGI benchmark jumping so much, the model's visual abilities are allowing it to do this well.


Animated SVGs are one of the example in the press release. Which is fine, I just think the weird SVG benchmark is now dead. Gemini has beat the benchmark and now differences are just coming down to taste.

I don't know if it got these abilities through generalization or if google gave it a dedicated animated SVG RL suite that got it to improve so much between models.

Regardless we need a new vibe check benchmark ala bicycle pelican.


What benchmark, though? There is very clearly a lot of room for improvement in its SVG making capabilities. The fact that it can now, finally, make a pelican on a bike that isn’t completely wrong is not an indicator that SVG generation is now a solved problem.

Interesting how it went a bit more 3D with the style of that one compared to the pelican I got.

Unfortunately it still fails my personal SVG benchmark (educational 2d cross section of the human heart), even after multiple iterations and screenshots feedback. Oh well, back to the (human) drawing board.

I'm thinking now that as models get better and better at generating SVGs, there could be a point where we can use them to just make arbitrary UIs and interactive media with raw SVGs in realtime (like flash games).

> there could be a point where we can use them to just make arbitrary UIs and interactive media with raw SVGs

So render ui elements using xml-like code in a web browser? You’re not going to believe me when I tell you this…


Or quite literally a game where SVG assets are generated on the fly using this model

Thats one dimension before another long term milestone: Realtime generation of 3D mesh content during gameplay.

Which is the "left brain" approach vs the "right brain" approach of coming at dynamic videogames from the diffusion model direction which the Gemini Genie thing seems to be about.


On the other hand, creation of other vector image formats (eg. "create a postscript file showing a walrus brushing its teeth") hasn't improved nearly so much.

Perhaps they're deliberately optimising for SVG generation.


can we move on from SVG to 3D models at some point?

Image to model is already a thing, and it's pretty good.

Currently working on:

https://vimgolf.ai

To show newbies how to use vim. Currently its not complete and has major issues. So if you want to try give it a go, but please hold your judgement as not all shortcuts have been added.


I have found GPT 5.3-Codex to do exceedingly well when working with graphics rendering pipelines. They must have better training data or RL approaches than Antropic as I have given the same prompt and config to Opus 4.6 and it seems to have added unwanted rendering artifacts. This may be just an issue specific to my use case, but wonder since OpenAI is partners with MSFT, which makes lots of games, that this may be an area they heavily invested in

That's insane. Deflock is a map of Flock cameras.

Definition of terrorism is:

the unlawful use of violence and intimidation, especially against civilians, in the pursuit of political aims.

couldn't be further apart from that.


While I think the use of the term “terrorist” is unwarranted, I do think deflock is seeking political change. The decision to use flock is a government policy choice, right?


Just the people’s choice, right? They voted for this government policy, right???!? https://www.coloradopolitics.com/2025/10/22/denver-mayor-ext...

>> “I was stunned to learn late yesterday that after convening a task force of local and national experts, Mayor Johnston has been negotiating secretly with the discredited CEO of Flock Safety and signing another unilateral extension of this mass surveillance contract with no public process and no vote from the City Council or input from his own task force,” Councilmember Sarah Parady told The Denver Gazette.


What is the point of this comment? Are you saying that deflock are not terrorists but are terrorist adjacent? Why respond to someone defining terrorism by pointing out that 2 words at the end of the definition also apply to deflock? Do those not apply to basically everyone who participates in their country's society, including literally everyone who votes and all politicians?

Political parties seek political change too, but that doesn't make them terrorists. Deflock isn't trying to intimidate or cause violence to citizens.

If corporations can be people, cameras can be people too! Think of the cameras! /s


Please don't give them ideas like that, even in jest.

I am very curious if this app is making money or are users just using the two generators and then leaving? If so I am very impressed with your wrapper around the image gen models.


I can imagine the reverse model could be very profitable with every real estate agent using it to make dreary photos look great.


Reverse model aimed at estate agents already posted in this thread by someone: https://news.ycombinator.com/item?id=46829566


this landing page is a lead gen tool for the architect at the bottom


Ahh, I see that. Thanks


This could be the future of film. Instead of prompting where you don't know what the model will produce, you could use fine-grained motion controls to get the shot you are looking for. If you want to adjust the shot after, you could just checkpoint the model there, by taking a screenshot, and rerun. Crazy.


I feel like people are already currently doing this. Essentially storyboarding first.

This guy a month ago for example: https://youtu.be/SGJC4Hnz3m0


Great work! Really respect AI2. they open source everything. The model, the weights, the training pipeline, inference stack, and corpus


Interesting, I use cloudflare containers and it takes roughly 6-7 seconds to boot up using a very lightweight image.


Maybe show how it works instead of making the home page a login screen.


I wonder if some of the docs from https://app.wafer.ai/docs could be used to make the model be better at writing GGML kernels. Interesting use case.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: