Windows MCP Server Icon

Windows MCP Server

Windows automation that actually works.

VS Code Marketplace Installs GitHub Stars GitHub Downloads LLM Test Results

Let AI Control Windows Apps — By Name, Not Pixels

Windows MCP Server gives your AI assistant direct access to Windows applications through the Windows UI Automation API. The same API screen readers use to read buttons, menus, and text fields.

Your AI says “click Save” and the server finds the Save button by name. No screenshots. No pixel parsing. No coordinate guessing.


What You Can Do

Ask your AI assistant to control any Windows application:

Works with GitHub Copilot, Claude Desktop, Cursor, and any MCP client.


Why This Approach

Most automation tools take screenshots and ask vision models to find buttons in the pixels. That approach is slow, expensive, and breaks when windows move or themes change.

Windows MCP Server asks Windows directly: “What buttons exist?” Windows knows. It’s deterministic. Same command works every time, regardless of DPI, theme, or resolution.


Tested with Real AI Models

Tool descriptions that seem clear to humans often confuse AI. Parameters get misunderstood. Actions get skipped.

We test every tool with real AI models (GPT-4.1, GPT-5.2) using agent-benchmark. 54 automated tests. 100% pass rate required for release.

If the AI can’t use it correctly, we fix the tool — not the prompt.

View latest LLM test results →


Quick Start

Install from VS Code Marketplace →

Other MCP Clients

Download from GitHub Releases. Add to your MCP config:

{ "servers": { "windows": { "command": "path/to/Sbroenne.WindowsMcp.exe" } } }

Tools

Tool What It Does
ui_click Click buttons, checkboxes, menu items by name
ui_type Type into text fields
ui_find Discover elements in a window (with timeout/retry)
ui_read Read text (with OCR fallback)
file_save Save files via Save As dialog
screenshot_control Get element metadata (image optional)
window_management Find, activate, move, resize windows
mouse_control Coordinate-based clicks (fallback for games)
keyboard_control Hotkeys and key sequences
app Launch applications

Complete tool reference →


Key Features

🧠 Semantic UI Access

Find elements by name, type, or ID — not coordinates. Works regardless of DPI, theme, resolution, or window position.

🧪 LLM-Tested

Every tool tested with real AI models before release. 54 automated tests across 7 scenarios. 100% pass rate required.

💻 Broad App Support

Tested against classic Windows apps, modern Windows 11 apps, and Electron apps (VS Code, Teams, Slack). Same commands work across all.

📺 Multi-Monitor

Full support for multiple displays with per-monitor DPI scaling. Move windows between monitors, capture any screen.

🔄 Full Fallback

Screenshot + mouse + keyboard for games and custom controls. Annotated screenshots return element metadata — image omitted by default to save tokens.


⚠️ Caution

This MCP server controls your Windows desktop. Use responsibly.