CAPTCHA-Solver

Completed

Two-Stage Computer Vision System

YOLO11Synthetic DataOpenCVPython

Role:AI/Vision Engineer

Duration:2 months

Overview

an advanced computer vision system designed to automatically solve complex SHEIN CAPTCHA challenges. It utilizes a highly accurate two-stage YOLO11 architecture. The system effectively handles the full interaction loop by first understanding the visual instructions and then executing the precise sequence of clicks required to bypass the CAPTCHA.

Architecture

The system relies on a sequential two-stage YOLO11 pipeline. Stage 1 parses the instruction prompt to identify target icons and determine the required click order. Stage 2 then scans the main CAPTCHA image to locate the coordinates of these specific targets and executes clicks in the exact sequence identified by the first stage. Training is supported by a dynamic data generator creating 10k+ highly augmented samples.

Key Features

Two-Stage Vision Architecture

Stage 1 identifies the target icons and their specific order, while Stage 2 precisely locates and interacts with the targets in the required sequence[cite: 49].

Synthetic Data Generation Engine

Built an automated engine that synthesized 10k+ labeled training samples, incorporating 360° rotation, color variations, and scaling.

Real-World Background Integration

Enhanced model robustness by training on synthetic data that utilizes actual backgrounds from the original SHEIN CAPTCHAs to prevent overfitting.

YOLO11 Detection

Utilizes the state-of-the-art YOLO11 architecture for high-accuracy object detection, achieving near perfect solving capabilities.

Tech Stack

AI/ML

YOLO11

State-of-the-art object detection model powering both vision stages

OpenCV

Image processing operations for handling CAPTCHA inputs and synthetic backgrounds

Backend

Python

Core language for model training, synthetic data generation, and inference logic

Data

Synthetic Data

Custom engine generating rotated, scaled, and color-varied training sets

Challenges & Solutions

Challenge

Standard object detection models cannot inherently understand the required interaction sequence for complex CAPTCHAs.

Solution

Engineered a two-stage vision system where Stage 1 explicitly extracts the target icons and order from the prompt, passing that state to Stage 2 for sequential execution.

Challenge

Extremely limited availability of labeled CAPTCHA datasets containing the necessary edge cases and background noise.

Solution

Built a data generation engine that synthesized 10k+ labeled training samples.

Challenge

The model struggled to generalize against visual distortions like rotation and variable sizing.

Solution

Incorporated 360° rotation, scaling, color variations, and actual CAPTCHA backgrounds into the synthetic data pipeline.

Results

Achieved near 100% accuracy in solving image-based CAPTCHAs

Successfully implemented a complex two-stage identification and execution pipeline

Synthesized 10k+ training samples with 360° rotation and scaling

Previous Project

LiteDB