Hey HN, we’re Gregor and Magnus, the founders of browser-use (https://browser-use.com/), an easy way to connect AI agents with the browser. Our agent library is open-source (https://github.com/browser-use/browser-use) and we have what is the biggest open-source community for browser agents. And now we have a cloud offering—hence our Launch HN today!
Check out this video to see it in action: https://preview.screen.studio/share/r1h4DuAk. There are lots more demos at https://github.com/browser-use/browser-use on how we control the web with prompts.
We started coding a decade ago with Selenium bots and macros to automate tasks. Then we both moved into ML. Last November, we asked ourselves, “How hard could it be to build the interface between LLMs and the web?”
We launched on Show HN (https://news.ycombinator.com/item?id=42052432) and have since been addressing various challenges of browser automation, such as: - Automation scripts break when the website changes - Automation scripts are annoying to build - Captchas and rate limits - parsing errors and API key management - and perhaps worst of all, login screens.
People use us to fill out their forms, extract data behind login walls, or automate their CRM. Others use the xPaths browser-use clicked on and build their scripts faster, or directly rerun the actions of browser-use deterministically. We’re currently working on robust task reruns, agent memory for long tasks, parallelization for repetitive tasks, and many other sweet improvements.
One interesting aspect is that some companies now want to change their UI to be more agent-friendly. Some developers even replace ugly UIs with nice ones and use browser-use to copy data over.
Besides the open-source we have an API. We host the browser and LLMs for you and help you with handling proxy rotation, persistent sessions and allowing you to run multiple instances in parallel. We price at $30/month—significantly lower than OpenAI’s Operator.
On the open-source side, browser use remains free. You can use any LLM, from Gemini to Sonnet, Qwen, or even DeepSeek-R1. It’s licensed under MIT, giving you full freedom to customize it.
We’d love to hear from you—what automation challenges are you facing? Any thoughts, questions, experiences are welcome!
Thank you! It's constructive, helps the people who are making things while giving them the benefit of the doubt, keeps users safe, and educate those who have enough technical understanding on the topic.
Could you go into a bit more detail about this? Why is exposing devtools to the agent a problem? What's the attack vector? That the agent might do something malicious to exfil saved passwords?
Forget the agent, browser-use's published setup instructions to use with your own Chrome profile and passwords [https://docs.browser-use.com/customize/real-browser, https://github.com/browser-use/browser-use/blob/495714e2dd38...] launches a Chrome session with Remote Debugging enabled.
These tools they are guiding users to setup and execute are "inherently insecure" [https://issues.chromium.org/issues/40056642].
So if you go to a site that can take advantage of these loopholes then your browser is likely to be compromised and could escalate from their.
Thanks, for the benefit of others the risk is that the devtools port has no Auth so is vulnerable to XSS.
I would surmise that this will stop being a problem if you switch to using a unix socket for the CDP.
If they’re launching a cloud-based service, doesn’t this effectively remove the risk of running it locally ?
Their key offering is an open source solution that you can run on your own laptop and Chrome browser, but their approach to doing this presents a huge security risk.
They do have a cloud offering that should not have these risks but then you have to enter your passwords into their cloud browser environment, presenting a different set of risks. Their cloud offering is basically similar to SkyVerne or even a higher cost tier subscription we have at rtrvr.ai
Cross origin is still a problem when those settings are off
This is very very important. It's completely unusable if this isn't solved. The agent could access a website that takes control of your machine.
how would that work? Can you control the browser without debug mode? Especially in production the browsers are anyway running on single instance docker containers so the file system is not accesible... are there exploits that can do harm from a virtual machine?
Yes, you can control the browser without debug mode, and the common way to do it is ChromeDriver[1].
[1] https://developer.chrome.com/docs/chromedriver/get-started
Yes, I was able to figure out a secure way to control the browser with AI Agents at rtrvr.ai without using debugger permissions/tools so it is most definitely possible.
I meant by in production in the sense how you are advising your users to setup the local installation. Even if you launch browser use locally within a container but your restarting the user's Chrome in debug mode and controlling it with CDP from within the container, then the door is wide open to exploits and the container doesn't do anything?!
Injecting JS into the page or controlling it using extension APIs is not a secure way to control the browser.
I never mentioned injecting JS into the page, and besides injecting LLM generated code or generally remote code won't be approved by the Chrome Store https://developer.chrome.com/docs/extensions/develop/migrate....
Your claim is analogous to saying that Apple's app store is not secure. We had to go through stringent vetting and testing by Google to list in the Chrome Store. Any basis or reasoning you can provide for your claim?
Regardless, its a wild leap to claim a Chrome Store Chrome Extension is more insecure than this arbitrary binary?
Yeah, sorta feels like docker on a new instance is safer than connecting to actual browsers and injecting js code there… would love to skip cdp protocol though, it’s quite restrictive
Are you making a straw man argument? I am not injecting js code, we solved this problem in a secure way with minimal permissions taken by our Chrome Extension, which runs in safe and secure sandbox within the browser.
Perhaps we are talking past each other, your literally giving instructions to your users to connect to their actual browsers: https://docs.browser-use.com/customize/real-browser Where under the hood your launching Chrome with debugging mode but with the user's credentials and passwords. This browser is then controlled via CDP by a highly insecure browser-use binary running in a container. Your users are bound to get pwned with this setup! https://github.com/browser-use/browser-use/blob/70ae758a3bfa...
Embed a WebView instead of launching browser?
I don’t know what problems they wanted to solve but one issue is browsers are trusted, views are not, so you have to workaround fingerprinting.