Complete Feature Reference

10 specialized tools for comprehensive Windows automation

🎯 The Approach: Semantic First, Fallback When Needed

Windows MCP uses the Windows UI Automation API as the primary interaction method. This gives AI agents semantic understanding of applications — finding elements by name, type, and state rather than parsing screenshots.

Token Optimization

All tool responses are designed for LLM efficiency, minimizing token usage while preserving information:

Optimization	Description	Token Savings
Short Property Names	`ok` instead of `success`, `h` instead of `handle`, `ec` instead of `errorCode`	~40%
Omitted Null Values	Null/empty fields are not included in responses	~15%
Compact Element Data	UI elements use `n` (name), `t` (type), `id` (elementId), `c` (coordinates)	~30%
JPEG Screenshots	Default JPEG at 60% quality instead of PNG	~70% smaller
Auto-Scaling	Screenshots auto-scale to 1568px width (vision model native limit)	~50% smaller

Example response comparison:

// Standard JSON (~180 tokens)
{ "success": true, "errorCode": "success", "message": "Clicked element", "element": { "name": "Save", "controlType": "Button", "handle": "123" } }

// Optimized JSON (~60 tokens)
{ "ok": true, "ec": "success", "msg": "Clicked", "el": { "n": "Save", "t": "Button", "h": "123" } }

This reduces LLM costs by ~60% and improves response times when processing tool results.

LLM Testing & Validation

Every tool is tested with real AI models using pytest-aitest to ensure LLMs understand tool descriptions and use them correctly.

Test Suite	Tests	Models	Pass Rate
Window Management	8	GPT-4.1, GPT-5.2	100%
Notepad UI Operations	10	GPT-4.1, GPT-5.2	100%
Paint UI Operations	16	GPT-4.1, GPT-5.2	100%
File Dialog Handling	6	GPT-4.1, GPT-5.2	100%
Screenshot Capture	6	GPT-4.1, GPT-5.2	100%
Keyboard & Mouse	8	GPT-4.1, GPT-5.2	100%
Real-World Workflows	8	GPT-4.1, GPT-5.2	100%

Why LLM testing matters:

Tool descriptions must be LLM-friendly — If the AI misunderstands a parameter, it fails silently
Response formats affect reasoning — Structured hints guide the LLM to correct next steps
Edge cases surface quickly — Real models find ambiguities that unit tests miss

LLM tests run as part of every release. See CONTRIBUTING.md for how to run them yourself.

When to Use Each Tool

Scenario	Tool	Why
Discover UI elements	`ui_find`	Find elements by name, type, or ID (with timeout/retry)
Click a button by name	`ui_click`	Semantic, works at any DPI/theme
Type text into a field	`ui_type`	Direct text input with clear option
Read text from elements	`ui_read`	Get text via UIA or OCR
Wait for windows	`window_management`	Use `wait_for` action for new windows
Save files	`file_save`	Handle Save As dialogs automatically
Discover UI visually	`screenshot_control`	Annotated screenshots with element data
Press hotkeys (Ctrl+S)	`keyboard_control`	Direct keyboard input
Custom controls / games	`mouse_control`	Coordinate-based fallback
Find/move windows	`window_management`	Window lifecycle control

Tools Overview

Tool	Description
`app`	Launch applications
`ui_find`	Find UI elements by name, type, or ID (with timeout/retry via `timeoutMs`)
`ui_click`	Click buttons, tabs, checkboxes
`ui_type`	Type text into edit controls
`ui_read`	Read text from elements (UIA + OCR)
`file_save`	Save files via Save As dialog (English Windows only)
`screenshot_control`	Annotated screenshots for discovery + fallback
`keyboard_control`	Keyboard input and hotkeys
`mouse_control`	Coordinate-based mouse input (fallback)
`window_management`	Window control and management

� App (`app`)

Launch applications and get their window handles for subsequent operations.

Parameters

Parameter	Description	Required
`programPath`	Program to launch (e.g., ‘notepad.exe’, ‘C:\Program Files\…\app.exe’)	Yes
`arguments`	Command-line arguments	No
`workingDirectory`	Working directory for the process	No
`waitForWindow`	Wait for window to appear (default: true)	No

Capabilities

Launch applications by name or full path
Automatic window detection after launch
Returns window handle for use with other tools
Configurable startup parameters

Example

app(programPath='notepad.exe') → handle='123456'
ui_type(windowHandle='123456', text='Hello World')

�🔍 UI Find (`ui_find`)

Find and discover UI elements by name, type, or automation ID.

Parameters

Parameter	Description	Required
`windowHandle`	Target window handle	Yes
`name`	Exact element name	No
`nameContains`	Partial name match	No
`namePattern`	Regex pattern for name	No
`automationId`	Automation ID (most reliable)	No
`controlType`	Control type (Button, Edit, CheckBox, etc.)	No
`maxResults`	Maximum elements to return	No
`sortByProminence`	Sort by bounding box area	No

Capabilities

Find elements by name, control type, or automation ID
Partial name matching with nameContains
Regex pattern matching with namePattern
Sort results by prominence (largest first) for disambiguation
Returns element IDs for use with other ui_* tools
Electron app support (VS Code, Teams, Slack)

🖱️ UI Click (`ui_click`)

Click buttons, tabs, checkboxes, and other interactive elements.

Parameters

Parameter	Description	Required
`windowHandle`	Target window handle	Yes
`elementId`	Element ID from ui_find	No*
`name` / `nameContains`	Element name/partial match	No*
`automationId`	Automation ID	No*
`controlType`	Control type filter	No

*One of elementId, name, nameContains, or automationId required.

Capabilities

Click buttons, tabs, menu items
Toggle checkboxes and toggle buttons
Handles various control patterns automatically
Falls back to coordinate-based click if pattern fails

⌨️ UI Type (`ui_type`)

Type text into edit controls and text fields.

Parameters

Parameter	Description	Required
`windowHandle`	Target window handle	Yes
`text`	Text to type	Yes
`elementId`	Element ID from ui_find	No*
`name` / `nameContains`	Element name/partial match	No*
`automationId`	Automation ID	No*
`controlType`	Control type (default: Edit)	No
`clearFirst`	Clear existing text before typing	No (default: true)

Capabilities

Type text into any editable control
Clear existing content before typing (clearFirst)
Append text to existing content
Unicode support for any language

📖 UI Read (`ui_read`)

Read text from elements using UI Automation or OCR.

Parameters

Parameter	Description	Required
`windowHandle`	Target window handle	Yes
`name` / `nameContains`	Element name/partial match	No
`automationId`	Automation ID	No
`controlType`	Control type filter	No
`includeChildren`	Include child element text	No (default: false)
`language`	OCR language code (e.g., ‘en-US’)	No

Capabilities

Extract text from any UI element
Automatic OCR fallback for custom-rendered text
Windows.Media.Ocr for local text recognition
Language support for international text

💾 File Save (`file_save`)

Save files via Save As dialog. Handles the entire save workflow: triggers save, waits for dialog, fills path, confirms. English Windows only (detects English dialog titles and button text).

Parameters

Parameter	Description	Required
`windowHandle`	Target window handle (the app window, not a dialog)	Yes
`filePath`	File path to save to (e.g., ‘C:\Users\User\file.txt’)	No

Capabilities

Trigger Ctrl+S to save
Auto-detect Save As dialog appearance
Fill in filename automatically
Handle overwrite confirmation dialogs
Works with Office apps, Notepad, and more

🖱️ Mouse Control

Control mouse input on Windows with full multi-monitor and DPI awareness.

Actions

Action	Description	Required Parameters
`move`	Move cursor to coordinates	`x`, `y`, `target` or `monitorIndex`
`click`	Left-click at coordinates	optional: `x`, `y`
`double_click`	Double-click at coordinates	optional: `x`, `y`
`right_click`	Right-click at coordinates	optional: `x`, `y`
`middle_click`	Middle-click at coordinates	optional: `x`, `y`
`drag`	Drag from current position to coordinates	`x`, `y`, `endX`, `endY`
`scroll`	Scroll at coordinates	`direction`, optional: `x`, `y`, `amount`
`get_position`	Get current cursor position with monitor context	none

Capabilities

Click, double-click, right-click, middle-click
Move cursor to absolute coordinates
Drag operations with hold/release
Scroll up/down/left/right
Multi-monitor support with DPI awareness
Easy targeting with target='primary_screen' or 'secondary_screen'
Modifier key support (Ctrl+click, Shift+click, etc.)
Wrong window detection with expectedWindowTitle / expectedProcessName

⌨️ Keyboard Control

Control keyboard input on Windows with Unicode support.

Actions

Action	Description	Required Parameters
`type`	Type text using Unicode input	`text`
`press`	Press and release a key (with optional modifiers)	`key`, optional `modifiers`
`key_down`	Hold a key down	`key`
`key_up`	Release a held key	`key`
`sequence`	Multiple keys in order	`keys`
`release_all`	Release all held keys	none
`get_keyboard_layout`	Query current layout	none
`wait_for_idle`	Wait for keyboard input to be processed	none

Supported Keys

Function Keys

f1 through f24

up, down, left, right, home, end, pageup, pagedown, insert, delete

Control

enter, tab, escape, space, backspace

Modifiers

ctrl, shift, alt, win

Media

volumemute, volumedown, volumeup, mediaplaypause, medianexttrack, mediaprevtrack, mediastop

Special

copilot (Windows 11 Copilot+ PCs)

Browser

browserback, browserforward, browserrefresh, browserstop, browsersearch, browserfavorites, browserhome

Capabilities

Unicode text typing (layout-independent) - type any character in any language
Virtual key presses - Enter, Tab, Escape, F1-F24, navigation keys
Key combinations - Use press with modifiers parameter: press(key='s', modifiers='ctrl') for Ctrl+S
Key sequences - multi-key macros with configurable timing
Hold/release keys - for Shift-select and other hold operations
Special keys - Copilot key (Windows 11), media controls, browser keys
Layout detection - query current keyboard layout (BCP-47 format)
Clear before typing - use clearFirst to select all (Ctrl+A) before typing new text
Wait for idle - wait for keyboard input to be processed before continuing

🪟 Window Management

Control windows on the Windows desktop. Use app tool to launch applications, then use this tool to manage the windows.

Actions

Action	Description	Required Parameters
`list`	List all visible windows	none
`find`	Find windows by title or process name	`title` or `processName`
`activate`	Bring window to foreground	`handle`
`get_foreground`	Get current foreground window	none
`get_state`	Get current window state (normal, minimized, maximized, hidden)	`handle`
`minimize`	Minimize window	`handle`
`maximize`	Maximize window	`handle`
`restore`	Restore window from min/max	`handle`
`close`	Close window (sends WM_CLOSE)	`handle`, optional `discardChanges`
`move`	Move window to position	`handle`, `x`, `y`
`resize`	Resize window	`handle`, `width`, `height`
`set_bounds`	Move and resize atomically	`handle`, `x`, `y`, `width`, `height`
`wait_for`	Wait for window to appear	`title`
`wait_for_state`	Wait for window to reach a specific state	`handle`, `state`, `timeoutMs`
`move_to_monitor`	Move window to a specific monitor	`handle`, `target` or `monitorIndex`
`move_and_activate`	Move to monitor and activate atomically	`handle`, `target` or `monitorIndex`
`ensure_visible`	Ensure window is visible (restore if minimized, activate)	`handle`

Close with discardChanges

Use discardChanges=true to automatically dismiss “Save?” dialogs when closing:

window_management(action='close', handle='123456', discardChanges=true)

English Windows only — detects English button text like “Don’t Save”.

Capabilities

List all visible top-level windows with titles, handles, process info, and bounds
Locate windows by title (substring or regex matching)
Bring windows to foreground with focus
Minimize, maximize, restore, and close windows
Position and size windows with move, resize, or set_bounds
Wait for a window to appear with configurable timeout
Move windows between monitors
Full multi-monitor support with DPI awareness
Proper UWP/Store app detection and handling
Cloaking detection to filter out virtual desktop and shell-managed windows

📸 Screenshot Capture

Capture screenshots on Windows with LLM-optimized defaults. By default, screenshots include annotated element overlays with numbered labels and structured element data — perfect for UI discovery.

Actions

Action	Description	Required Parameters
`capture`	Capture screenshot (with element annotations by default)	`target` or `app`
`list_monitors`	List all connected monitors	none

Capture Targets

Target	Description	Additional Parameters
`app`	Recommended — Capture specific app window by name	Partial title match
`primary_screen`	Capture primary monitor (default)	none
`secondary_screen`	Capture secondary monitor (2-monitor setups)	none
`monitor`	Capture specific monitor	`monitorIndex`
`window`	Capture specific window by handle	`windowHandle`
`region`	Capture rectangular region	`regionX`, `regionY`, `regionWidth`, `regionHeight`
`all_monitors`	Composite of all displays	none

Parameters

Parameter	Type	Default	Description
`app`	string	`null`	Recommended. Application name (partial title match). Auto-finds and activates window.
`annotate`	boolean	`true`	Include numbered element overlays and structured element data
`includeCursor`	boolean	`false`	Include mouse cursor in capture
`imageFormat`	string	`"jpeg"`	Output format: “jpeg”, “png”
`quality`	integer	`60`	Compression quality for JPEG (1-100)
`outputMode`	string	`"inline"`	“inline” (base64) or “file” (save to disk)
`outputPath`	string	`null`	Custom file path when using file output mode

Annotated Screenshot Response

When annotate=true (default), the response includes structured element data. Image is omitted by default (includeImage=false) to save ~100K+ tokens:

{
  "success": true,
  "annotated_elements": [
    { "index": 1, "element_id": "...", "name": "File", "control_type": "MenuItem", "clickable_point": { "x": 50, "y": 30 } },
    { "index": 2, "element_id": "...", "name": "Edit", "control_type": "MenuItem", "clickable_point": { "x": 100, "y": 30 } }
  ],
  "element_count": 25
}

Use case: When you don’t know element names, capture an annotated screenshot first. The numbered labels in the image correspond to the structured element data, making it easy to identify what to click.

Plain Screenshot (No Annotations)

For simple screenshots without element discovery:

{
  "action": "capture",
  "windowHandle": "123456",
  "annotate": false
}

Capabilities

Annotated by Default - Screenshots include numbered element overlays and structured data for UI discovery
LLM-Optimized - JPEG format, auto-scaling to 1568px, quality 60 for minimal token usage
Easy targeting - Use window_management(action='find', title='...') to get a handle, then pass to screenshot_control
Capture any monitor - Screenshot any connected display by index
Capture windows - Screenshot a specific window (even if partially obscured)
Capture regions - Screenshot an arbitrary rectangular area
Capture all monitors - Composite screenshot of entire virtual desktop
Format options - JPEG (default) or PNG with configurable quality (1-100)
Auto-scaling - Defaults to 1568px width (LLM vision model native limit); disable with maxWidth: 0
Output modes - Inline base64 (default) or file path for zero-overhead file workflows
Cursor inclusion - Optionally include mouse cursor in captures
Multi-monitor aware - Supports extended desktop configurations
DPI aware - Correct pixel dimensions on high-DPI displays

Error Handling

The server handles common Windows security scenarios:

Error Code	Description
`ElevatedWindowActive`	Target window is running as Administrator
`SecureDesktopActive`	UAC prompt or lock screen is active
`InvalidKey`	Unrecognized key name
`InputBlocked`	Input was blocked by UIPI
`Timeout`	Operation timed out
`OperationTimeout`	Operation timed out (with configured timeout duration)
`InvalidMonitorIndex`	Monitor index out of range
`InvalidWindowHandle`	Window handle is invalid or window no longer exists
`MissingRequiredParameter`	A required parameter was not provided
`CoordinatesOutOfBounds`	Coordinates are outside monitor boundaries
`WindowMinimized`	Cannot capture minimized window
`WindowNotVisible`	Window is not visible
`InvalidRegion`	Capture region has invalid dimensions
`CaptureFailed`	Screenshot capture operation failed
`SizeLimitExceeded`	Requested capture exceeds maximum allowed size
`WrongTargetWindow`	Foreground window doesn’t match expected title/process (use expectedWindowTitle/expectedProcessName)

Configuration

Environment Variables

Variable	Default	Description
`MCP_WINDOWS_KEYBOARD_CHUNK_DELAY_MS`	`10`	Delay between text chunks
`MCP_WINDOWS_KEYBOARD_KEY_DELAY_MS`	`10`	Delay between key presses
`MCP_WINDOWS_KEYBOARD_SEQUENCE_DELAY_MS`	`50`	Delay between sequence keys
`MCP_WINDOWS_MOUSE_MOVE_DELAY_MS`	`10`	Delay after mouse move
`MCP_WINDOWS_MOUSE_CLICK_DELAY_MS`	`50`	Delay after mouse click
`MCP_WINDOWS_WINDOW_TIMEOUT_MS`	`5000`	Default window operation timeout
`MCP_WINDOWS_WINDOW_WAITFOR_TIMEOUT_MS`	`30000`	Default wait_for timeout
`MCP_WINDOWS_WINDOW_PROPERTY_TIMEOUT_MS`	`100`	Timeout for querying window properties
`MCP_WINDOWS_WINDOW_POLLING_INTERVAL_MS`	`250`	Polling interval for wait_for
`MCP_WINDOWS_WINDOW_ACTIVATION_MAX_RETRIES`	`3`	Max retries for window activation
`MCP_WINDOWS_SCREENSHOT_TIMEOUT_MS`	`5000`	Screenshot operation timeout
`MCP_WINDOWS_SCREENSHOT_MAX_PIXELS`	`33177600`	Maximum capture size (default 8K)

Known Limitations

UAC & Elevated Processes

Windows security prevents any non-elevated process from interacting with UAC prompts or elevated (Administrator) windows. This is a fundamental Windows security boundary that no MCP server can bypass.

Scenario	Behavior	Workaround
`winget install` (or similar) triggers UAC prompt	Command may report success, but UAC blocks the installer until user approves	Run terminal as Administrator before invoking winget
Target app is running as Administrator	`ui_click`, `ui_type`, `keyboard_control` return `ElevatedWindowActive` error	Run the MCP server elevated, or launch the app without elevation
UAC prompt appears mid-workflow	AI cannot see or interact with the secure desktop	User must manually approve the UAC prompt
Secure desktop (lock screen, Ctrl+Alt+Del)	All input methods blocked	User must unlock manually

Why This Matters

Many common operations trigger elevation:

Installing software (winget, choco, MSI installers)
Modifying system settings
Apps that “Run as Administrator”
Antivirus, device manager, service management tools

When building automation workflows, plan for elevation boundaries:

Pre-install required software before running AI workflows
Use per-user installers when available (e.g., VS Code user installer)
Run the MCP server elevated if you need to automate admin tasks (with appropriate caution)

Security Considerations

UIPI: Windows User Interface Privilege Isolation blocks input to elevated windows from non-elevated processes
Secure Desktop: Input cannot be sent during UAC prompts or lock screen
Input Simulation: The server uses SendInput which is the standard Windows API for simulating input

This site is open source. Improve this page.

Complete Feature Reference

🎯 The Approach: Semantic First, Fallback When Needed

Token Optimization

LLM Testing & Validation

When to Use Each Tool

Tools Overview

� App (app)

Parameters

Capabilities

Example

�🔍 UI Find (ui_find)

Parameters

Capabilities

🖱️ UI Click (ui_click)

Parameters

Capabilities

⌨️ UI Type (ui_type)

Parameters

Capabilities

📖 UI Read (ui_read)

Parameters

Capabilities

💾 File Save (file_save)

Parameters

Capabilities

🖱️ Mouse Control

Actions

Capabilities

⌨️ Keyboard Control

Actions

Supported Keys

Function Keys

Navigation

Control

Modifiers

Media

Special

Browser

Capabilities

🪟 Window Management

Actions

Close with discardChanges

Capabilities

📸 Screenshot Capture

Actions

Capture Targets

Parameters

Annotated Screenshot Response

Plain Screenshot (No Annotations)

Capabilities

Error Handling

Configuration

Environment Variables

Known Limitations

UAC & Elevated Processes

Why This Matters

Security Considerations

� App (`app`)

�🔍 UI Find (`ui_find`)

🖱️ UI Click (`ui_click`)

⌨️ UI Type (`ui_type`)

📖 UI Read (`ui_read`)

💾 File Save (`file_save`)