Browser Use creator here; we are working on prototypes like this but always find ourselves stuck with the safety vs freedom questions. We are very well aware how easy it is to inject stuff into the browser and do something malicious hence sandboxed browser still seem to like a very good idea. I guess in the long run we will not even need browsers, just a background agent that does stuff in the background. Is there any good research for guardrails of how to prevent “go to my bank and send the money to nigerian prince” style prompts?
Less flippantly that was sort of my thought. I’m probably a paranoid idiot and I’m not really sure I can articulate this idea properly but I can imagine a less concise but broader prompt and an agent configured in a way it has privileges you dont want it to have or a path to escalate them and its not quite AGI but its a virus on steroids - like a company or resource (think utilities) killer. I hope Im just missing something but these models seem pretty capable of wreaking all kinds of havoc if they just keep looping and have access nobody in their right mind wants.
I really like both nodriver and pydoll. I am definitely keeping the option of switching to them open, but we just wanted to have full control for now and see how painful CDP-use is to maintain first and then reconsider.
I mean... Playwright was built and is maintained by Microsoft, so I don't think VC money argument really makes sense here.
By the very nature of how Playwright is built we can't contribute to it - it runs inside a JS subprocess and does not expose a bunch of CDP apis that we NEED (for example to make cross origin iframes work).
Scrolling to an element doesn’t always work because somehow the element might not be ready. You need to add ids to the element and select by that to ensure it works properly.
thanks! yeah, playwright was a huge improvement there -- waiting until an element was actually ready. the official posture from the selenium project ("figure it out, be explicit") wasn't always the most user friendly messaging.
having to add ids to elements is one of those classic tradeoffs -- the alternative was to use css or xpath selectors, which can be even worse, maintenance-wise. i'm secretly hoping ai code-gen apps pumped out by things like Lovable or Claude Code automagically generate element test-ids and the tests for you and we never have to worry about it again.
It’s literally the only issue I’ve ever encountered with selenium, and having ids would be good for more reasons than just testing, so for me it wasn’t even an actual issue.
i'm at the edge of my chrome internals knowledge here, but i'd answer the question with a question: isn't backendnodeid only stable within a single session?
that might not matter if the agent is re-finding the element between sessions anyway, but then you're paying a lookup cost (time + tokens) each time. compared to just using document.getelementbyid() on an explicit id.
iirc it's stable across sessions until the tab closes, even though their docs dont guarantee it.
we cant modify the dom to add IDs because we'd get detected by block-blockers very quickly. we're gradually trying to get rid of all DOM tampering entirely for that reason.
I think it's awesome (we are close friends with Erik from Pig so slightly biased) - one extreme is Browser Use, which is just an agent that does everything for the first time, the other extreme is Workflow Use, which is almost deterministic. I think the winner product lies somewhere in the middle - Browser Use + Cache is easier to do for browser trajectories than for pure images! We will definitely try this direction!
1) we made this sick function in browser use library which analyses when there are no more requests going through - so we just reuse that!
2) yeah good question. The end goal is to completely regenerate the flow if it breaks (let browser use explore the “new” website and update the original flow). But let’s see, soo much could be done here!
1. Oh yes right. I remember trying it out thinking it was going to be brittle because of analytics etc but it filters for those surprisingly well.
2. We are working on https://www.launchskylight.com/ , agentic QA. For the self onboarding version we are using pure CUA without caching. (We wanted to avoid playwright to make it more flexible for canvas+iframe based apps,where we found HTML based approaches like browser-use limited, and to support desktop apps in the future).
We are betaing caching internally for customers, and releasing it for the self-onboarding soon. We use CUA actions for caching instead of playwright. Caching with pixel native models is def a bit more brittle for clicks and we focus on purely vision based analysis to decide to proceed or not. I think for scaling though you are 100% right, screenshots every step for validating are okay/worth it, but running an agent non-deterministically for actions is def an overkill for enterprise, that was what we found as well.
Geminis video understanding is also an interesting way to analyze what went wrong in more interactive apps. Apart from that i think we share quite a bit of the core thinking, would be interested to chat, will DM!
we heard a lot of complaints about Browser Use being slow. Last few weeks we focused a lot on improving the speed, while keeping the same accuracy.
Go try it out. It's really fun to see it glide the web.