<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Paul Dufour]]></title><description><![CDATA[Software Engineer at Uber, creating AI experiences]]></description><link>https://pdufour.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!wOcI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb7e1d9-0b50-4de3-a59d-0a5ad462de4a_2736x2736.jpeg</url><title>Paul Dufour</title><link>https://pdufour.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 21 Jun 2026 13:13:43 GMT</lastBuildDate><atom:link href="https://pdufour.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Paul Dufour]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[pdufour@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[pdufour@substack.com]]></itunes:email><itunes:name><![CDATA[Paul Dufour]]></itunes:name></itunes:owner><itunes:author><![CDATA[Paul Dufour]]></itunes:author><googleplay:owner><![CDATA[pdufour@substack.com]]></googleplay:owner><googleplay:email><![CDATA[pdufour@substack.com]]></googleplay:email><googleplay:author><![CDATA[Paul Dufour]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Writing a browser-use agent from scratch - Part 1/3 - The Capture]]></title><description><![CDATA[There is no better way to learn how a system works than rebuilding it from the ground up. In this article series I tackle a solved problem (browser-use) but engineer the entire stack to run in WASM.]]></description><link>https://pdufour.substack.com/p/writing-a-browser-use-agent-from</link><guid isPermaLink="false">https://pdufour.substack.com/p/writing-a-browser-use-agent-from</guid><dc:creator><![CDATA[Paul Dufour]]></dc:creator><pubDate>Tue, 16 Jun 2026 02:06:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G88y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G88y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G88y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 424w, https://substackcdn.com/image/fetch/$s_!G88y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 848w, https://substackcdn.com/image/fetch/$s_!G88y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 1272w, https://substackcdn.com/image/fetch/$s_!G88y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G88y!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1521757,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G88y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 424w, https://substackcdn.com/image/fetch/$s_!G88y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 848w, https://substackcdn.com/image/fetch/$s_!G88y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 1272w, https://substackcdn.com/image/fetch/$s_!G88y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe653a08d-d415-48dc-b00a-9eb6bb215617_1774x887.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are a couple steps to creating a AI browser use-agent. This gets extra complicated if you are doing this entirely on the browser (without sending images to a server). At first I thought this might not even be possible, which made it all more fun to come up with a solution.</p><p>Part of the fun was the novelty - people had done the parts separately before, there were grounding LLM models, there were webpage capturing libraries, but no one had ever brought everything together.</p><p>The solution I&#8217;ve developed is here <strong><a href="https://github.com/pdufour/browser-use-wasm">https://github.com/pdufour/browser-use-wasm</a></strong> and in the next few articles I want to cover the core components that make a browser use agent and what I learned along the way. Follow me on LinkedIn and <a href="https://www.linkedin.com/in/pauldufour/">https://www.linkedin.com/in/pauldufour/</a> to learn about my next post.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://pdufour.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If AI, browser-use, LLMs, or WASM / WebGPU topics interest you subscribe below!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>The core parts to a browser-use agent</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Hyl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Hyl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 424w, https://substackcdn.com/image/fetch/$s_!5Hyl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 848w, https://substackcdn.com/image/fetch/$s_!5Hyl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 1272w, https://substackcdn.com/image/fetch/$s_!5Hyl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Hyl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png" width="1456" height="559" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:390027,&quot;alt&quot;:&quot;1 Capture Live page iframe or DOM under your app Screenshot buffer Bitmap the VLA reads - not the live DOM tree SnapDOM drawElementImage 2 Agent loop VLA grounding Task + screenshot &#8594; action dict(s) with position click input enter Act on live DOM elementFromPoint at each position click type select Pipeline diagram&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="1 Capture Live page iframe or DOM under your app Screenshot buffer Bitmap the VLA reads - not the live DOM tree SnapDOM drawElementImage 2 Agent loop VLA grounding Task + screenshot &#8594; action dict(s) with position click input enter Act on live DOM elementFromPoint at each position click type select Pipeline diagram" title="1 Capture Live page iframe or DOM under your app Screenshot buffer Bitmap the VLA reads - not the live DOM tree SnapDOM drawElementImage 2 Agent loop VLA grounding Task + screenshot &#8594; action dict(s) with position click input enter Act on live DOM elementFromPoint at each position click type select Pipeline diagram" srcset="https://substackcdn.com/image/fetch/$s_!5Hyl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 424w, https://substackcdn.com/image/fetch/$s_!5Hyl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 848w, https://substackcdn.com/image/fetch/$s_!5Hyl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 1272w, https://substackcdn.com/image/fetch/$s_!5Hyl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76d7433e-43e8-42c5-9941-4733acc5350c_2400x922.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As seen above, the core parts of browser-use are the capture, ground, and act steps. For now, let&#8217;s stick to talking about capture.</p><p>You have three options for the capture step:</p><ol><li><p>Send an entire DOM tree (all the HTML markup) to a text-based LLM which then responds with the coordinates for the actions you want to take, and executes them accordingly</p></li></ol><ol start="2"><li><p>Don&#8217;t send any HTML to the LLM, instead have the LLM try to intelligently call out to the DOM based on a number of CSS selectors (i.e. user says click the Order button and the LLM looks for buttons with &#8220;Order&#8221; in their text.</p></li><li><p>Send an image of the page to a vision language action model (preferred option)</p></li></ol><p>Option 1<strong> </strong>is not practical though because of the huge size of DOM trees. Also we didn&#8217;t even consider the fact that in order for a LLM to truly &#8220;understand&#8221; a page, you would also need the styles. The following table compares the two options for a few sample webpages.</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/ZUOL7/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46240762-e855-4755-8d0a-7b964a25d219_1220x572.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb33af26-4cbe-43e1-b1b8-5ba4892979e4_1220x642.png&quot;,&quot;height&quot;:316,&quot;title&quot;:&quot;Capture Options&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/ZUOL7/1/" width="730" height="316" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>As you can see the screenshot option is a lot more practical in regards to context size for a browser-use library. Context size is very limited on webgpu. Running on a m4 max gives these:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/tFStz/6/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10a83723-015d-4fcb-9b71-90c9ebf4981c_1220x250.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd23a504-1efc-450d-a1ff-f2d361dbfcde_1220x374.png&quot;,&quot;height&quot;:260,&quot;title&quot;:&quot;Chrome WebGPU context limits (Qwen2.5-0.5B q4)&quot;,&quot;description&quot;:&quot;MacBook Pro &#183; Apple M4 Max &#183; 64 GB&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/tFStz/6/" width="730" height="260" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>Constraints are good for end-users though - it means their machines won&#8217;t crawl to a halt because of a client side LLM running.</p><p>For option 2 - have the LLM generate DOM query selectors - I didn&#8217;t even attempt that - most likely the performance would be so bad it would not make it worth it. I can think of so many edge cases for it:</p><ul><li><p><strong>Iframes</strong> - the code required to traverse all iframes as well and include that in the context would be a) very difficult and b) probably hit a lot of security problems. A vision model handles this elegantly because you actually see what the user sees, iframe or not.</p></li><li><p><strong>Canvas / WebGL / &lt;video&gt; -</strong> vision based models could actually &#8220;see&#8221; these videos so you could ask things like &#8220;click the video that has a panda in it&#8221;</p></li><li><p>&#8220;Click the green button&#8221; - natural for VLAs</p></li></ul><h2>The capturing implementation </h2><p>Now that we&#8217;ve discussed why we are using a vision-based approach, let&#8217;s talk about the actual capturing implementation. What I mean again by capturing, is the browser-use agent capturing the page you are on and converting it to a screenshot that a VLA can read.</p><p>There are a couple options which we are going to be looking at:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/kJYQ5/13/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1078bac-b291-4a12-8c2c-b621750d7385_1220x652.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7b28d9e-7424-4b34-af9b-18fb08c2ea41_1220x788.png&quot;,&quot;height&quot;:480,&quot;title&quot;:&quot;DOM-to-screenshot capture libraries&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/kJYQ5/13/" width="730" height="480" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>html2canvas (<a href="https://github.com/niklasvh/html2canvas">https://github.com/niklasvh/html2canvas</a>) is probably one of the first libraries to do this and has existed for years. This library has not been maintained for some time though, so that led to other forks being developed - html2canvas-pro (<a href="https://github.com/yorickshan/html2canvas-pro">https://github.com/yorickshan/html2canvas-pro</a>) being one of them.</p><p>However,  newer options became available in recent years which followed a new methodology.  One of these libraries is called snapdom. This quickly became popular as seen below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n8BV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n8BV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 424w, https://substackcdn.com/image/fetch/$s_!n8BV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 848w, https://substackcdn.com/image/fetch/$s_!n8BV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 1272w, https://substackcdn.com/image/fetch/$s_!n8BV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n8BV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png" width="1456" height="1139" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1139,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:395281,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n8BV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 424w, https://substackcdn.com/image/fetch/$s_!n8BV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 848w, https://substackcdn.com/image/fetch/$s_!n8BV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 1272w, https://substackcdn.com/image/fetch/$s_!n8BV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc4e5a4b-a802-42d9-a267-6e2eefc21cdb_3152x2466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://pdufour.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you like these articles about browser-use subscribe below!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Let&#8217;s compare the different options and go into why SnapDOM is preferable for a browser-use task.</p><h3><strong>html2canvas</strong></h3><p>html2Canvas and its fork operate on the same methodology: walk the dom, gather computed styles, and re-draw that to a canvas using canvas draw commands. Let&#8217;s take an example:</p><p><strong>Live DOM</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;html&quot;,&quot;nodeId&quot;:&quot;430be266-a9fc-4be8-accc-63af3bbba455&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-html">&lt;button id="submit" style="
  background: #22c55e;
  color: white;
  padding: 8px 16px;
  border-radius: 8px;
  font: 600 14px Inter;
"&gt;Submit&lt;/button&gt;</code></pre></div><p><strong>Html2Canvas</strong> will look at this and execute roughly the following:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:&quot;33e116a7-866e-41aa-b62c-a492edad5f68&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">
// Pseudocode &#8212; what the library effectively generates
const ctx = canvas.getContext('2d');
// &lt;button&gt; background + border-radius (no compositor &#8212; you draw the pill yourself)
ctx.save();
ctx.fillStyle = '#22c55e';
roundRect(ctx, 120, 48, 88, 36, 8);  // x,y,w,h,r from layout math
ctx.fill();
ctx.restore();
// "Submit" label (font metrics parsed and re-measured)
ctx.save();
ctx.fillStyle = '#ffffff';
ctx.font = '600 14px Inter';
ctx.textBaseline = 'middle';
ctx.fillText('Submit', 132, 66);
ctx.restore();</code></pre></div><p></p><h3><strong>SnapDOM</strong></h3><p>For a browser-use library, performance is critical, so we needed another method. Snapdom uses the following methodology:</p><ol><li><p>Clone the DOM and associated metadata - It remembers where the user has scrolled, what input fields have been filled out, and context needed to recreate the page</p></li><li><p>Style computation - it extracts all styles for each element using getComputedStyle</p></li><li><p>Asset embedding - embed fonts so they work within the SVG context and same for images.</p></li><li><p>Serialization - this is the trick, after you have the full clone with styles and all - Snapdom wraps the entire HTML inside an SVG using &lt;foreignObject&gt;. Now you have a perfect representation of your page in an SVG.</p></li><li><p>Rendering - snapdom then moves your SVG over to a canvas element. From there it can be exportable to other formats such as PNG and JPG.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8or2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8or2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 424w, https://substackcdn.com/image/fetch/$s_!8or2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 848w, https://substackcdn.com/image/fetch/$s_!8or2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 1272w, https://substackcdn.com/image/fetch/$s_!8or2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8or2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png" width="1456" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:174424,&quot;alt&quot;:&quot;SnapDOM Clone &#8594; Style &#8594; Embed &#8594; SVG &#8594; Canvas Clone DOM + metadata Scroll, filled inputs Style getComputedStyle Pseudo-elements inlined Embed Fonts for SVG context Images &#8594; data URLs Serialize Wrap in <foreignObject> Full page inside SVG Render SVG &#8594; <canvas> Export PNG &#183; JPG SnapDOM methodology&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="SnapDOM Clone &#8594; Style &#8594; Embed &#8594; SVG &#8594; Canvas Clone DOM + metadata Scroll, filled inputs Style getComputedStyle Pseudo-elements inlined Embed Fonts for SVG context Images &#8594; data URLs Serialize Wrap in <foreignObject> Full page inside SVG Render SVG &#8594; <canvas> Export PNG &#183; JPG SnapDOM methodology" title="SnapDOM Clone &#8594; Style &#8594; Embed &#8594; SVG &#8594; Canvas Clone DOM + metadata Scroll, filled inputs Style getComputedStyle Pseudo-elements inlined Embed Fonts for SVG context Images &#8594; data URLs Serialize Wrap in <foreignObject> Full page inside SVG Render SVG &#8594; <canvas> Export PNG &#183; JPG SnapDOM methodology" srcset="https://substackcdn.com/image/fetch/$s_!8or2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 424w, https://substackcdn.com/image/fetch/$s_!8or2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 848w, https://substackcdn.com/image/fetch/$s_!8or2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 1272w, https://substackcdn.com/image/fetch/$s_!8or2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9010ac-2e8e-43a0-bdfe-65e3cde1e424_2400x712.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The following is a performance benchmark of SnapDOM to html2canvas-pro, the numbers don&#8217;t lie!</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/OYr7Q/5/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebba20d6-f2ef-423a-a9c5-516d37149875_1220x330.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f60e5288-edf7-45c3-846f-077f84a19981_1220x454.png&quot;,&quot;height&quot;:400,&quot;title&quot;:&quot;SnapDOM vs html2canvas-pro capture time&quot;,&quot;description&quot;:&quot;MacBook Pro &#183; Apple M4 Max &#183; 64 GB &#183; median of 5 runs&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/OYr7Q/5/" width="730" height="400" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><h3><strong>Blitz</strong></h3><p>I didn&#8217;t include this option as a contender because it&#8217;s still quite experimental but I did want to give it a shout out. Blitz (<a href="https://github.com/DioxusLabs/blitz">https://github.com/DioxusLabs/blitz</a>) is completely different than the other options. It renders webpages without using Chrome&#8217;s rendering engine. You still write html + css but it renders using a custom redering engine.</p><p>I have an example here of what that looks like - <a href="https://github.com/pdufour/browserbrowserbrowser">https://github.com/pdufour/browserbrowserbrowser</a>. Although it may look like a simple iframe, the canvas which renders the webpage is completely controlled by Blitz which is a rust module exported to a WASM module. So WASM is in fact rendering your webpage. Cool stuff!</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;7b6ef2ea-83f8-4d12-a866-de9d99d52f6d&quot;,&quot;duration&quot;:null}"></div><h2><strong>Using Snapdom in a browser-agent</strong></h2><p>So we&#8217;ve clarified why Snapdom is good to use for a browser-use agent. Let&#8217;s go over our trials and tribulations with using it.</p><p>In order to create an accurate browser-use agent pixels are perfect. Clicking on [10,11] is very different than clicking on [10,12]. One could be a button, one could be whitespace! Much to my dismay, this was exactly the problem I experienced after integrating Snapdom into my browser-use library.</p><p>The issue is listed here: <a href="https://github.com/zumerlab/snapdom/issues/421">https://github.com/zumerlab/snapdom/issues/421</a>. It took some time for me to understand this issue though. At first all I knew was that my button clicks weren&#8217;t registering.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BUoK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BUoK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 424w, https://substackcdn.com/image/fetch/$s_!BUoK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 848w, https://substackcdn.com/image/fetch/$s_!BUoK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 1272w, https://substackcdn.com/image/fetch/$s_!BUoK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BUoK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:317795,&quot;alt&quot;:&quot;Vertical drift - for browser-use, pixels matter: clicking [10, 11] vs [10, 12] can be a button vs whitespace. SnapDOM&#8217;s canvas export paints text slightly lower than the live page. The break is SVG &#8594; canvas rasterization, not &#8220;wait longer for fonts.&#8221; Live compositor Complete your order Review details before you submit. Cancel Submit SnapDOM canvas (what the VLA sees) Live heading (4&#215; zoom) Complete your order Review details before you submit. Cancel Submit Canvas heading (4&#215; zoom) Heading ink sits +3.5px lower on the canvas. Same normalized [x, y] on live vs screenshot can miss the control the model intended. Live page vs SnapDOM capture SnapDOM capture Heading zoom - same crop on live and canvas&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Vertical drift - for browser-use, pixels matter: clicking [10, 11] vs [10, 12] can be a button vs whitespace. SnapDOM&#8217;s canvas export paints text slightly lower than the live page. The break is SVG &#8594; canvas rasterization, not &#8220;wait longer for fonts.&#8221; Live compositor Complete your order Review details before you submit. Cancel Submit SnapDOM canvas (what the VLA sees) Live heading (4&#215; zoom) Complete your order Review details before you submit. Cancel Submit Canvas heading (4&#215; zoom) Heading ink sits +3.5px lower on the canvas. Same normalized [x, y] on live vs screenshot can miss the control the model intended. Live page vs SnapDOM capture SnapDOM capture Heading zoom - same crop on live and canvas" title="Vertical drift - for browser-use, pixels matter: clicking [10, 11] vs [10, 12] can be a button vs whitespace. SnapDOM&#8217;s canvas export paints text slightly lower than the live page. The break is SVG &#8594; canvas rasterization, not &#8220;wait longer for fonts.&#8221; Live compositor Complete your order Review details before you submit. Cancel Submit SnapDOM canvas (what the VLA sees) Live heading (4&#215; zoom) Complete your order Review details before you submit. Cancel Submit Canvas heading (4&#215; zoom) Heading ink sits +3.5px lower on the canvas. Same normalized [x, y] on live vs screenshot can miss the control the model intended. Live page vs SnapDOM capture SnapDOM capture Heading zoom - same crop on live and canvas" srcset="https://substackcdn.com/image/fetch/$s_!BUoK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 424w, https://substackcdn.com/image/fetch/$s_!BUoK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 848w, https://substackcdn.com/image/fetch/$s_!BUoK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 1272w, https://substackcdn.com/image/fetch/$s_!BUoK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdc96594-fe4d-4d17-a9eb-59d460c8b0dc_2400x1360.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>How to get from finding the drift to root cause discovery</strong></p><p>As you can see from the image the red and blue lines don&#8217;t match up!</p><p>I tried many different solutions to fix this drift including:</p><ol><li><p><strong>Wait longer before capture</strong></p></li></ol><p>There were a mix of timing tricks I tried: waiting for fonts to load, double image encode pass, using requestAnimationFrame before encoding the image.</p><p><strong>Result: This had no effect on the text shift</strong></p><ol start="2"><li><p><strong>Rounding consistency fix</strong></p></li></ol><p>Snapdom turns your page into a canvas bitmap, and canvas elements use size in device pixels. On a MacBook retina screen, if a page box is 316 px tall, and DPR is 2, device pixels should be 316 x 2 = 632.</p><p>However layout a lot of times gives fractional numbers like 632.5 - so for the canvas to render that, it has to make a decision - round that number up or down. So instead of 632.5 you get 632 or 633. So I tried rounding using floor and ceil operations whenever I saw a fractional number to see if this would fix it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iZ0H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iZ0H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 424w, https://substackcdn.com/image/fetch/$s_!iZ0H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 848w, https://substackcdn.com/image/fetch/$s_!iZ0H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 1272w, https://substackcdn.com/image/fetch/$s_!iZ0H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iZ0H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png" width="1456" height="1972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:716649,&quot;alt&quot;:&quot;Fractional pixels. To render the DOM to canvas you need real device pixels. If the layout height is fractional, like 316.3px, you multiply by devicePixelRatio and the canvas engine has to round the result. Different rendering engines may round differently. 1CSS height: 316.3 px &#215; 2DPR: 2 = 3632.6 device px - not an integer - canvas.height must pick floor, round, or ceil. floor / round / ceil on canvas.height floor(632.6) = 632 canvas.height = 632 0.6 device px clipped (~0.30 CSS px) round(632.6) = 633 canvas.height = 633 &#183; SnapDOM default 0.4 device px empty below (~0.20 CSS px) ceil(632.6) = 633 canvas.height = 633 same as round 0.4 device px empty below (~0.20 CSS px) Bottom edge magnified - only the last ~2.0 device px (real gap is sub-pixel) Layout bottom - 632.6 device px floor canvas - 632 device px round canvas - 633 device px ceil canvas - 633 device px Card previews look the same, but when you zoom in you can see the difference - the magnified ruler shows how much floor, round, and ceil can shift the layout. Takeaway: Rounding fractional pixels may be an issue, and rounding everything beforehand to whole numbers does fix some issues, but the font issue still persists. Fractional device pixels on a text card&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Fractional pixels. To render the DOM to canvas you need real device pixels. If the layout height is fractional, like 316.3px, you multiply by devicePixelRatio and the canvas engine has to round the result. Different rendering engines may round differently. 1CSS height: 316.3 px &#215; 2DPR: 2 = 3632.6 device px - not an integer - canvas.height must pick floor, round, or ceil. floor / round / ceil on canvas.height floor(632.6) = 632 canvas.height = 632 0.6 device px clipped (~0.30 CSS px) round(632.6) = 633 canvas.height = 633 &#183; SnapDOM default 0.4 device px empty below (~0.20 CSS px) ceil(632.6) = 633 canvas.height = 633 same as round 0.4 device px empty below (~0.20 CSS px) Bottom edge magnified - only the last ~2.0 device px (real gap is sub-pixel) Layout bottom - 632.6 device px floor canvas - 632 device px round canvas - 633 device px ceil canvas - 633 device px Card previews look the same, but when you zoom in you can see the difference - the magnified ruler shows how much floor, round, and ceil can shift the layout. Takeaway: Rounding fractional pixels may be an issue, and rounding everything beforehand to whole numbers does fix some issues, but the font issue still persists. Fractional device pixels on a text card" title="Fractional pixels. To render the DOM to canvas you need real device pixels. If the layout height is fractional, like 316.3px, you multiply by devicePixelRatio and the canvas engine has to round the result. Different rendering engines may round differently. 1CSS height: 316.3 px &#215; 2DPR: 2 = 3632.6 device px - not an integer - canvas.height must pick floor, round, or ceil. floor / round / ceil on canvas.height floor(632.6) = 632 canvas.height = 632 0.6 device px clipped (~0.30 CSS px) round(632.6) = 633 canvas.height = 633 &#183; SnapDOM default 0.4 device px empty below (~0.20 CSS px) ceil(632.6) = 633 canvas.height = 633 same as round 0.4 device px empty below (~0.20 CSS px) Bottom edge magnified - only the last ~2.0 device px (real gap is sub-pixel) Layout bottom - 632.6 device px floor canvas - 632 device px round canvas - 633 device px ceil canvas - 633 device px Card previews look the same, but when you zoom in you can see the difference - the magnified ruler shows how much floor, round, and ceil can shift the layout. Takeaway: Rounding fractional pixels may be an issue, and rounding everything beforehand to whole numbers does fix some issues, but the font issue still persists. Fractional device pixels on a text card" srcset="https://substackcdn.com/image/fetch/$s_!iZ0H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 424w, https://substackcdn.com/image/fetch/$s_!iZ0H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 848w, https://substackcdn.com/image/fetch/$s_!iZ0H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 1272w, https://substackcdn.com/image/fetch/$s_!iZ0H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd6fa915-bb54-4212-83cf-507af0e4aa5f_2400x3250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Result: This seemed to fix some scenarios out of luck but wasn&#8217;t consistent</strong></p><ol start="3"><li><p><strong>Font fix</strong></p></li></ol><p>By now most people would have given up, but if you can&#8217;t have accurate capturing, your entire browser-use agent falls apart. Clicks won&#8217;t work, typings won&#8217;t work. Luckily it was now that I noticed something common in all the times I saw the drift - it was always within text elements. So I came up with an example that clearly demonstrated the problem:<br></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;1b181ac5-ed69-42d2-94c9-8b767537c2b1&quot;,&quot;duration&quot;:null}"></div><p>The github author of snapdom luckily helped me out here and pointed it out to a <a href="https://github.com/zumerlab/snapdom/issues/429#issuecomment-4614557496">built in font issue</a>:</p><blockquote><p>There is also another related cause: snapDOM does not embed system fonts such as <code>system-ui</code>. That is not something <code>embedFonts: true</code> can solve by itself, because browsers do not expose system font files to JavaScript.</p><p>To make the captured result match the live render more closely, you need to use an actual font file, either locally through <code>localFonts</code>, or externally through something like a CDN or Google Fonts, with <code>embedFonts</code> enabled.</p></blockquote><p>Font metrics are crucial. CSS uses the metrics from font files to determine where to place text within a line.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MRGe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MRGe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 424w, https://substackcdn.com/image/fetch/$s_!MRGe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 848w, https://substackcdn.com/image/fetch/$s_!MRGe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 1272w, https://substackcdn.com/image/fetch/$s_!MRGe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MRGe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png" width="1456" height="1252" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1252,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:462429,&quot;alt&quot;:&quot;How CSS places text. font-family is only a name. The engine lays out each line using metrics baked into the font file - ascender height, descender depth, cap height, x-height, and baseline. Change the file (or substitute a fallback) and the same CSS moves ink even when the family name stays the same. em box (line box) Hh gyp ascender cap height x-height baseline descender Real system-ui text on one baseline. Metric lines show where the font file places ascender, cap height, x-height, baseline, and descender. line-height and vertical alignment use the font&#8217;s ascender/descender tables - not the painted glyph bbox alone. system-ui resolves to an OS face at runtime; there is often no .woff2 URL for capture tools to load. A loaded webfont (@font-face, Google Fonts, CDN) ships the same metrics file the layout engine already used live. CSS font metrics primer Font em box: ascender, cap height, x-height, baseline, descender on Hh gyp&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://pdufour.substack.com/i/201958188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How CSS places text. font-family is only a name. The engine lays out each line using metrics baked into the font file - ascender height, descender depth, cap height, x-height, and baseline. Change the file (or substitute a fallback) and the same CSS moves ink even when the family name stays the same. em box (line box) Hh gyp ascender cap height x-height baseline descender Real system-ui text on one baseline. Metric lines show where the font file places ascender, cap height, x-height, baseline, and descender. line-height and vertical alignment use the font&#8217;s ascender/descender tables - not the painted glyph bbox alone. system-ui resolves to an OS face at runtime; there is often no .woff2 URL for capture tools to load. A loaded webfont (@font-face, Google Fonts, CDN) ships the same metrics file the layout engine already used live. CSS font metrics primer Font em box: ascender, cap height, x-height, baseline, descender on Hh gyp" title="How CSS places text. font-family is only a name. The engine lays out each line using metrics baked into the font file - ascender height, descender depth, cap height, x-height, and baseline. Change the file (or substitute a fallback) and the same CSS moves ink even when the family name stays the same. em box (line box) Hh gyp ascender cap height x-height baseline descender Real system-ui text on one baseline. Metric lines show where the font file places ascender, cap height, x-height, baseline, and descender. line-height and vertical alignment use the font&#8217;s ascender/descender tables - not the painted glyph bbox alone. system-ui resolves to an OS face at runtime; there is often no .woff2 URL for capture tools to load. A loaded webfont (@font-face, Google Fonts, CDN) ships the same metrics file the layout engine already used live. CSS font metrics primer Font em box: ascender, cap height, x-height, baseline, descender on Hh gyp" srcset="https://substackcdn.com/image/fetch/$s_!MRGe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 424w, https://substackcdn.com/image/fetch/$s_!MRGe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 848w, https://substackcdn.com/image/fetch/$s_!MRGe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 1272w, https://substackcdn.com/image/fetch/$s_!MRGe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79a3880b-12d8-4e17-8e8e-f432d8d889ac_2400x2064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If those metrics are not available you are bound to have issues. Luckily in this case the issue was resolved and then I could begin the next stage of building a AI browser use agent: the grounding.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://pdufour.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>