10 specialized tools for comprehensive Windows automation
Windows MCP uses the Windows UI Automation API as the primary interaction method. This gives AI agents semantic understanding of applications — finding elements by name, type, and state rather than parsing screenshots.
All tool responses are designed for LLM efficiency, minimizing token usage while preserving information:
| Optimization | Description | Token Savings |
|---|---|---|
| Short Property Names | ok instead of success, h instead of handle, ec instead of errorCode |
~40% |
| Omitted Null Values | Null/empty fields are not included in responses | ~15% |
| Compact Element Data | UI elements use n (name), t (type), id (elementId), c (coordinates) |
~30% |
| JPEG Screenshots | Default JPEG at 60% quality instead of PNG | ~70% smaller |
| Auto-Scaling | Screenshots auto-scale to 1568px width (vision model native limit) | ~50% smaller |
Example response comparison:
// Standard JSON (~180 tokens)
{ "success": true, "errorCode": "success", "message": "Clicked element", "element": { "name": "Save", "controlType": "Button", "handle": "123" } }
// Optimized JSON (~60 tokens)
{ "ok": true, "ec": "success", "msg": "Clicked", "el": { "n": "Save", "t": "Button", "h": "123" } }
This reduces LLM costs by ~60% and improves response times when processing tool results.
Every tool is tested with real AI models using agent-benchmark to ensure LLMs understand tool descriptions and use them correctly.
| Test Suite | Tests | Models | Pass Rate |
|---|---|---|---|
| Window Management | 8 | GPT-4.1, GPT-5.2 | 100% |
| Notepad UI Operations | 10 | GPT-4.1, GPT-5.2 | 100% |
| Paint UI Operations | 16 | GPT-4.1, GPT-5.2 | 100% |
| File Dialog Handling | 6 | GPT-4.1, GPT-5.2 | 100% |
| Screenshot Capture | 6 | GPT-4.1, GPT-5.2 | 100% |
| Keyboard & Mouse | 8 | GPT-4.1, GPT-5.2 | 100% |
| Real-World Workflows | 8 | GPT-4.1, GPT-5.2 | 100% |
Why LLM testing matters:
LLM tests run as part of every release. See CONTRIBUTING.md for how to run them yourself.
| Scenario | Tool | Why |
|---|---|---|
| Discover UI elements | ui_find |
Find elements by name, type, or ID (with timeout/retry) |
| Click a button by name | ui_click |
Semantic, works at any DPI/theme |
| Type text into a field | ui_type |
Direct text input with clear option |
| Read text from elements | ui_read |
Get text via UIA or OCR |
| Wait for windows | window_management |
Use wait_for action for new windows |
| Save files | file_save |
Handle Save As dialogs automatically |
| Discover UI visually | screenshot_control |
Annotated screenshots with element data |
| Press hotkeys (Ctrl+S) | keyboard_control |
Direct keyboard input |
| Custom controls / games | mouse_control |
Coordinate-based fallback |
| Find/move windows | window_management |
Window lifecycle control |
| Tool | Description |
|---|---|
app |
Launch applications |
ui_find |
Find UI elements by name, type, or ID (with timeout/retry via timeoutMs) |
ui_click |
Click buttons, tabs, checkboxes |
ui_type |
Type text into edit controls |
ui_read |
Read text from elements (UIA + OCR) |
file_save |
Save files via Save As dialog (English Windows only) |
screenshot_control |
Annotated screenshots for discovery + fallback |
keyboard_control |
Keyboard input and hotkeys |
mouse_control |
Coordinate-based mouse input (fallback) |
window_management |
Window control and management |
app)Launch applications and get their window handles for subsequent operations.
| Parameter | Description | Required |
|---|---|---|
programPath |
Program to launch (e.g., ‘notepad.exe’, ‘C:\Program Files\…\app.exe’) | Yes |
arguments |
Command-line arguments | No |
workingDirectory |
Working directory for the process | No |
waitForWindow |
Wait for window to appear (default: true) | No |
app(programPath='notepad.exe') → handle='123456'
ui_type(windowHandle='123456', text='Hello World')
ui_find)Find and discover UI elements by name, type, or automation ID.
| Parameter | Description | Required |
|---|---|---|
windowHandle |
Target window handle | Yes |
name |
Exact element name | No |
nameContains |
Partial name match | No |
namePattern |
Regex pattern for name | No |
automationId |
Automation ID (most reliable) | No |
controlType |
Control type (Button, Edit, CheckBox, etc.) | No |
maxResults |
Maximum elements to return | No |
sortByProminence |
Sort by bounding box area | No |
nameContainsnamePatternui_click)Click buttons, tabs, checkboxes, and other interactive elements.
| Parameter | Description | Required |
|---|---|---|
windowHandle |
Target window handle | Yes |
elementId |
Element ID from ui_find | No* |
name / nameContains |
Element name/partial match | No* |
automationId |
Automation ID | No* |
controlType |
Control type filter | No |
*One of elementId, name, nameContains, or automationId required.
ui_type)Type text into edit controls and text fields.
| Parameter | Description | Required |
|---|---|---|
windowHandle |
Target window handle | Yes |
text |
Text to type | Yes |
elementId |
Element ID from ui_find | No* |
name / nameContains |
Element name/partial match | No* |
automationId |
Automation ID | No* |
controlType |
Control type (default: Edit) | No |
clearFirst |
Clear existing text before typing | No (default: true) |
ui_read)Read text from elements using UI Automation or OCR.
| Parameter | Description | Required |
|---|---|---|
windowHandle |
Target window handle | Yes |
name / nameContains |
Element name/partial match | No |
automationId |
Automation ID | No |
controlType |
Control type filter | No |
includeChildren |
Include child element text | No (default: false) |
language |
OCR language code (e.g., ‘en-US’) | No |
file_save)Save files via Save As dialog. Handles the entire save workflow: triggers save, waits for dialog, fills path, confirms. English Windows only (detects English dialog titles and button text).
| Parameter | Description | Required |
|---|---|---|
windowHandle |
Target window handle (the app window, not a dialog) | Yes |
filePath |
File path to save to (e.g., ‘C:\Users\User\file.txt’) | No |
Control mouse input on Windows with full multi-monitor and DPI awareness.
| Action | Description | Required Parameters |
|---|---|---|
move |
Move cursor to coordinates | x, y, target or monitorIndex |
click |
Left-click at coordinates | optional: x, y |
double_click |
Double-click at coordinates | optional: x, y |
right_click |
Right-click at coordinates | optional: x, y |
middle_click |
Middle-click at coordinates | optional: x, y |
drag |
Drag from current position to coordinates | x, y, endX, endY |
scroll |
Scroll at coordinates | direction, optional: x, y, amount |
get_position |
Get current cursor position with monitor context | none |
target='primary_screen' or 'secondary_screen'expectedWindowTitle / expectedProcessNameControl keyboard input on Windows with Unicode support.
| Action | Description | Required Parameters |
|---|---|---|
type |
Type text using Unicode input | text |
press |
Press and release a key (with optional modifiers) | key, optional modifiers |
key_down |
Hold a key down | key |
key_up |
Release a held key | key |
sequence |
Multiple keys in order | keys |
release_all |
Release all held keys | none |
get_keyboard_layout |
Query current layout | none |
wait_for_idle |
Wait for keyboard input to be processed | none |
f1 through f24
up, down, left, right, home, end, pageup, pagedown, insert, delete
enter, tab, escape, space, backspace
ctrl, shift, alt, win
volumemute, volumedown, volumeup, mediaplaypause, medianexttrack, mediaprevtrack, mediastop
copilot (Windows 11 Copilot+ PCs)
browserback, browserforward, browserrefresh, browserstop, browsersearch, browserfavorites, browserhome
press with modifiers parameter: press(key='s', modifiers='ctrl') for Ctrl+SclearFirst to select all (Ctrl+A) before typing new textControl windows on the Windows desktop. Use app tool to launch applications, then use this tool to manage the windows.
| Action | Description | Required Parameters |
|---|---|---|
list |
List all visible windows | none |
find |
Find windows by title or process name | title or processName |
activate |
Bring window to foreground | handle |
get_foreground |
Get current foreground window | none |
get_state |
Get current window state (normal, minimized, maximized, hidden) | handle |
minimize |
Minimize window | handle |
maximize |
Maximize window | handle |
restore |
Restore window from min/max | handle |
close |
Close window (sends WM_CLOSE) | handle, optional discardChanges |
move |
Move window to position | handle, x, y |
resize |
Resize window | handle, width, height |
set_bounds |
Move and resize atomically | handle, x, y, width, height |
wait_for |
Wait for window to appear | title |
wait_for_state |
Wait for window to reach a specific state | handle, state, timeoutMs |
move_to_monitor |
Move window to a specific monitor | handle, target or monitorIndex |
move_and_activate |
Move to monitor and activate atomically | handle, target or monitorIndex |
ensure_visible |
Ensure window is visible (restore if minimized, activate) | handle |
Use discardChanges=true to automatically dismiss “Save?” dialogs when closing:
window_management(action='close', handle='123456', discardChanges=true)
English Windows only — detects English button text like “Don’t Save”.
Capture screenshots on Windows with LLM-optimized defaults. By default, screenshots include annotated element overlays with numbered labels and structured element data — perfect for UI discovery.
| Action | Description | Required Parameters |
|---|---|---|
capture |
Capture screenshot (with element annotations by default) | target or app |
list_monitors |
List all connected monitors | none |
| Target | Description | Additional Parameters |
|---|---|---|
app |
Recommended — Capture specific app window by name | Partial title match |
primary_screen |
Capture primary monitor (default) | none |
secondary_screen |
Capture secondary monitor (2-monitor setups) | none |
monitor |
Capture specific monitor | monitorIndex |
window |
Capture specific window by handle | windowHandle |
region |
Capture rectangular region | regionX, regionY, regionWidth, regionHeight |
all_monitors |
Composite of all displays | none |
| Parameter | Type | Default | Description |
|---|---|---|---|
app |
string | null |
Recommended. Application name (partial title match). Auto-finds and activates window. |
annotate |
boolean | true |
Include numbered element overlays and structured element data |
includeCursor |
boolean | false |
Include mouse cursor in capture |
imageFormat |
string | "jpeg" |
Output format: “jpeg”, “png” |
quality |
integer | 60 |
Compression quality for JPEG (1-100) |
outputMode |
string | "inline" |
“inline” (base64) or “file” (save to disk) |
outputPath |
string | null |
Custom file path when using file output mode |
When annotate=true (default), the response includes structured element data. Image is omitted by default (includeImage=false) to save ~100K+ tokens:
{
"success": true,
"annotated_elements": [
{ "index": 1, "element_id": "...", "name": "File", "control_type": "MenuItem", "clickable_point": { "x": 50, "y": 30 } },
{ "index": 2, "element_id": "...", "name": "Edit", "control_type": "MenuItem", "clickable_point": { "x": 100, "y": 30 } }
],
"element_count": 25
}
Use case: When you don’t know element names, capture an annotated screenshot first. The numbered labels in the image correspond to the structured element data, making it easy to identify what to click.
For simple screenshots without element discovery:
{
"action": "capture",
"windowHandle": "123456",
"annotate": false
}
window_management(action='find', title='...') to get a handle, then pass to screenshot_controlmaxWidth: 0The server handles common Windows security scenarios:
| Error Code | Description |
|---|---|
ElevatedWindowActive |
Target window is running as Administrator |
SecureDesktopActive |
UAC prompt or lock screen is active |
InvalidKey |
Unrecognized key name |
InputBlocked |
Input was blocked by UIPI |
Timeout |
Operation timed out |
OperationTimeout |
Operation timed out (with configured timeout duration) |
InvalidMonitorIndex |
Monitor index out of range |
InvalidWindowHandle |
Window handle is invalid or window no longer exists |
MissingRequiredParameter |
A required parameter was not provided |
CoordinatesOutOfBounds |
Coordinates are outside monitor boundaries |
WindowMinimized |
Cannot capture minimized window |
WindowNotVisible |
Window is not visible |
InvalidRegion |
Capture region has invalid dimensions |
CaptureFailed |
Screenshot capture operation failed |
SizeLimitExceeded |
Requested capture exceeds maximum allowed size |
WrongTargetWindow |
Foreground window doesn’t match expected title/process (use expectedWindowTitle/expectedProcessName) |
| Variable | Default | Description |
|---|---|---|
MCP_WINDOWS_KEYBOARD_CHUNK_DELAY_MS |
10 |
Delay between text chunks |
MCP_WINDOWS_KEYBOARD_KEY_DELAY_MS |
10 |
Delay between key presses |
MCP_WINDOWS_KEYBOARD_SEQUENCE_DELAY_MS |
50 |
Delay between sequence keys |
MCP_WINDOWS_MOUSE_MOVE_DELAY_MS |
10 |
Delay after mouse move |
MCP_WINDOWS_MOUSE_CLICK_DELAY_MS |
50 |
Delay after mouse click |
MCP_WINDOWS_WINDOW_TIMEOUT_MS |
5000 |
Default window operation timeout |
MCP_WINDOWS_WINDOW_WAITFOR_TIMEOUT_MS |
30000 |
Default wait_for timeout |
MCP_WINDOWS_WINDOW_PROPERTY_TIMEOUT_MS |
100 |
Timeout for querying window properties |
MCP_WINDOWS_WINDOW_POLLING_INTERVAL_MS |
250 |
Polling interval for wait_for |
MCP_WINDOWS_WINDOW_ACTIVATION_MAX_RETRIES |
3 |
Max retries for window activation |
MCP_WINDOWS_SCREENSHOT_TIMEOUT_MS |
5000 |
Screenshot operation timeout |
MCP_WINDOWS_SCREENSHOT_MAX_PIXELS |
33177600 |
Maximum capture size (default 8K) |
SendInput which is the standard Windows API for simulating input