Hacker Newsnew | past | comments | ask | show | jobs | submit | MagMueller's commentslogin

Interesting read. Agree that GUI is super hard for agents. Did you see "skills" from browser-use? We directly interact with network requests now.

I worked for 2 years in a co-working space full of founders next to ETH Zurich. The most consistent worker? The cleaning lady. Every morning at 6 am, she did not miss a single day.

I grew up in a small village in Germany. 500 people, 5000 cows. Only farmers and a cheese factory. In the factory, we worked on Christmas, Easter, and New Year's Eve every morning at 5 am. Farmers don't take days off because cows don't take days off.

Maybe it's not the most healthy way of life. I don't think it physically requires us to take time.


We could do a hackathon where its only allowed to change 1 line.


I would love to fix my docs with this. I have them in the main browser-use repo. What do you recommend that the agent does never push to main browser-use, but only to its own branch?


Yeah you can easily tweak this to push to a branch or a fork or something in the generated prompt.md


Yes so you can run the same form over and over again with different input variables, very reliable, fast and cheap


In the main library this feature could help you with that: https://github.com/browser-use/browser-use/pull/1437


One option could be for the main apps like WhatsApp to have defined custom actions, which are almost like an API to the service. I think the interplay between LLM and automation scripts will succeed here:

Agent call 1: Send WhatsApp message (to=Magnus, text=hi) Inside, you open WhatsApp and search for Magnus (without LLM)

Agent call 2: Select contact from all possible Magnus contacts Script 3: Type the message and click send

So in total, 2 calls - with Gemini, you could already achieve this in 10-15 seconds.


We see people replacing UIs and using browser-use to fill out the real UI. So there could be a world where everyone has their own UI, and you could have that filter option.

Furthermore, valid point: if Pepsi spends $1M on ads, why don't you get a piece of it if they pitch to you?


I use browser-use. I use use-browser. I use mac-use. I use use.


It could be useful to run a prompt/test once, get the xPaths, and rerun it deterministically. When it breaks, you know something is wrong, and the LLM could be used as a fallback to fix the script.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: