You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
APIAS (API Auto Scraper) is an AI powered python script that scrapes any api documentation website and convert the pages to a compact and structured form tailored for LLMs
APIAS (AI Powered API Documentation Scraper) is a powerful tool that helps you extract and convert API documentation from various sources into structured formats.
Features
Scrape API documentation from web pages
Support for multiple documentation formats
AI-powered content extraction and structuring
Command-line interface for easy use
Multiple output formats (Markdown, JSON, YAML)
Batch processing mode with interactive TUI
Requirements
Python 3.10 or higher (Python 3.9 is not supported)
max_retries: 0 = Never retry (give up immediately on any error)
max_retries: 3 = Try up to 3 times before giving up
max_retries: 5 = Very persistent, keeps trying longer
chunk_size - How Big Are the Pieces?
chunk_size: 50000# Default: 50,000 characters
Web pages can be HUGE. We can't send a giant page to the AI all at once (it would choke!). So we cut it into smaller pieces called "chunks":
Giant Web Page (200,000 characters) ====================================
Gets cut into pieces:
[ Chunk 1 ] [ Chunk 2 ] [ Chunk 3 ] [ Chunk 4 ] (50,000) (50,000) (50,000) (50,000) | | | | v v v v AI AI AI AI | | | | v v v v [Result 1] [Result 2] [Result 3] [Result 4]
Then all results get merged back together!
chunk_size: 30000 = Smaller pieces (more API calls, but safer for complex pages)
chunk_size: 50000 = Default balance
chunk_size: 100000 = Bigger pieces (fewer API calls, but might hit token limits)
model - Which AI Brain to Use?
model: gpt-5-nano # Default: fast, affordable, and highly capable
OpenAI GPT-5 models offer excellent quality at different price points. Prices shown below are approximate and may change - check OpenAI Pricing for current rates:
Model
Context
Input
Output
Best For
gpt-5-nano
272K
Very Low
Very Low
Most scraping tasks (recommended default)
gpt-5-mini
272K
Low
Low
Complex documentation
gpt-5
272K
Medium
Medium
Premium quality extraction
gpt-5.1
272K
Medium
Medium
Agentic tasks, coding (newest)
gpt-5-pro
400K
High
High
Extended context, highest quality
Note: Most GPT-5 models support 128K output tokens; gpt-5-pro supports 272K output tokens. The gpt-5-nano model offers the best cost-performance ratio for API documentation scraping.
limit - Maximum Pages to Scrape
limit: 50# Only scrape up to 50 pages (null = no limit)
In batch mode, a website might have thousands of pages. Use limit to control how many:
Tip: The Conservative estimate is typically accurate for well-structured API documentation. Use the Worst Case estimate for budget planning with complex or messy HTML.
APIAS (API Auto Scraper) is an AI powered python script that scrapes any api documentation website and convert the pages to a compact and structured form tailored for LLMs