AI assistants can talk but can't act
Every AI assistant in 2025 works the same way: you type, it responds with text. But what if you want it to actually do things on your computer? Open apps, fill forms, navigate interfaces, execute multi-step workflows?
The gap between "AI that answers questions" and "AI that does work" is enormous. Bridging it requires an agent that can see, understand, and act in a real desktop environment.
See the screen, control the machine
Friday is an autonomous agent that operates your Mac the way a human would: by looking at the screen and using the keyboard and mouse.
- Computer vision pipeline: captures screen, identifies UI elements, reads text via OCR
- macOS Accessibility API: programmatic control of any application's UI tree
- Multi-LLM reasoning: uses language models for planning, decomposition, and decision-making
- Full desktop autonomy: clicks, types, scrolls, drags, switches apps, handles dialogs
Give Friday a natural language instruction and it figures out the steps, navigates the interface, and completes the task — handling errors and unexpected states along the way.
What Friday can do
- Navigate complex UIs across any macOS application
- Execute multi-step workflows (file management, data entry, web browsing)
- Recover from errors: detects unexpected popups, loading states, failures
- Chain actions across multiple applications to complete high-level goals
The bridge to physical robots
Friday isn't just a productivity tool. It's a proof of concept for a deeper idea: if an AI can perceive a visual environment, understand context, plan actions, and execute them through physical interfaces — that's exactly what a robot needs to do in the real world.
The perception-action loop in Friday (screen → understanding → plan → mouse/keyboard) maps directly to the loop a robot needs (cameras → understanding → plan → actuators). Building Friday was a stepping stone toward 20n.
How it works
- Screen capture layer: real-time screenshots at configurable intervals
- Vision module: element detection, text extraction (OCR), layout understanding
- Accessibility bridge: macOS AX API for precise UI tree traversal
- Planning engine: LLM-powered task decomposition and action sequencing
- Execution layer: synthetic input events (mouse moves, clicks, keystrokes)
- Error recovery: state verification after each action, retry logic, alternative paths