Your Crawler Is Testing the Same Button 50 Times. Ours Isn't.

Research

•

2.11.2026

•

by Rahul Krishna Yandrapally

Modern single-page applications create a unique challenge for vulnerability scanners. As users navigate through dashboards, settings panels, and data views, the application generates thousands of distinct "states". Many of these states may look different but function identically. Traditional crawlers treat each state as unique, wasting precious scan time re-testing the same components over and over. In order to address this problem, we've upgraded our crawler with FragGen, a fragment-based crawling approach that dramatically improves both coverage and speed.

The Problem: Redundant Testing at Scale

Consider a typical enterprise dashboard with 50 pages. Each page shares the same header, navigation menu, and footer. A traditional crawler sees 50 unique pages and dutifully tests the logout button, search bar, and navigation links on every single one - that's 50 times testing identical functionality.

This matters because:

Wasted scan time on redundant testing means less time for real exploration
Limited scan budgets leave deeper application states unexplored
Vulnerabilities hiding in unique functionality go undetected while the crawler re-tests the same header

The Solution: Fragment-Based Crawling

FragGen, based on academic research, takes a different approach to crawling web pages. Instead of treating a web page as a singular functional unit, FragGen considers a web page to be a set of independent functional blocks. It uses a visual analysis to break web pages into visually coherent fragments - the header, navigation, content area, forms, and footer that users actually see.

Here's how it works:

Visual segmentation: Each page is divided into visual blocks using the VIPS algorithm, the same technique search engines use to understand page layout.
Fragment recognition: When the same navigation menu appears across 50 pages, FragGen identifies it once and tracks it.
Smart comparison: Tree edit distance algorithms compare fragment structures rather than raw HTML strings - like comparing family trees instead of reading novels line by line.
Efficient exploration: The crawler prioritizes unexplored fragments, not just unexplored pages.

This visual understanding also improves our LLM-powered form handling - by extracting just the relevant form fragment, we provide cleaner context for generating realistic inputs.

Making It Production-Ready

FragGen was originally designed as a research technique. While the core idea was sound, applying it to real-world applications at production scale introduced serious performance and concurrency challenges. A scan that should take minutes was taking hours. We invested heavily in closing the gap between research prototype and production-grade scanner.

Faster fragment analysis: The original algorithm compared every new fragment against every existing one - an expensive operation that grew quadratically. We restructured the comparison pipeline to quickly rule out non-matches and avoid redundant work, cutting analysis time by orders of magnitude.
Non-blocking architecture: Fragment comparison is computationally intensive, but it shouldn't slow down crawling. We moved the heavy analysis off the critical path so browsers continue exploring while fragment processing runs in the background.
Multi-browser scaling: Running multiple browsers in parallel should multiply throughput, but the original architecture wasn't built for it. We redesigned the core data structures for thread safety and eliminated lock contention, enabling parallel crawling with zero errors under high concurrency.

The Results

In production testing, FragGen delivers:

Faster scans: Multi-browser crawling scales linearly, so scans complete proportionally faster.
Deeper coverage: By skipping redundant components, the crawler reaches application states that were previously left unexplored within the scan budget.
More vulnerabilities found: Deeper coverage means the scanner tests functionality that other tools never reach - the admin panel behind three navigations, the edge-case form buried in settings.
Reliable at scale: The upgraded architecture handles complex, large-scale applications without errors or slowdowns.

We tested FragGen against our previous crawler across three different web applications - an informational web app, a form-heavy learning management system, and a simple CRUD application - each given the same 30-minute crawl budget.

Metric	Before	After (FragGen)	Improvement
Application states discovered	26 avg	98 avg	3.5× more
Crawl efficiency (states/min)	0.86 avg	4.5 avg	5× faster
Unique URLs covered	23 avg	44 avg	1.9× more

In one case, the upgraded crawler achieved complete application coverage in just 6 minutes - the same application that the previous crawler couldn't finish in 30. The benefits are dramatic in applications with a large state space.

Try It Now

FragGen is available now in NightVision for scans longer than 2 hours. Start a scan and see deeper coverage on your next test.

Experience confidence in your AppSec Program

Schedule a NightVision Demo

Insight

3.27.2026

More articles

The Native Domain of Agentic Engineering

NightVision Skills for Claude Code Are Now Available. Here's What They Do.

Claude Code Security Is Here. What It Changes, and What It Doesn’t.

Experience confidence in your AppSec Program