Friday — Nick0

The Problem

AI assistants can talk but can't act

Every AI assistant in 2025 works the same way: you type, it responds with text. But what if you want it to actually do things on your computer? Open apps, fill forms, navigate interfaces, execute multi-step workflows?

The gap between "AI that answers questions" and "AI that does work" is enormous. Bridging it requires an agent that can see, understand, and act in a real desktop environment.

The Solution

See the screen, control the machine

Friday is an autonomous agent that operates your Mac the way a human would: by looking at the screen and using the keyboard and mouse.

Computer vision pipeline: captures screen, identifies UI elements, reads text via OCR
macOS Accessibility API: programmatic control of any application's UI tree
Multi-LLM reasoning: uses language models for planning, decomposition, and decision-making
Full desktop autonomy: clicks, types, scrolls, drags, switches apps, handles dialogs

Give Friday a natural language instruction and it figures out the steps, navigates the interface, and completes the task — handling errors and unexpected states along the way.

Capabilities

What Friday can do

Any

macOS app

Full

Autonomy

Multi

LLM backend

Navigate complex UIs across any macOS application
Execute multi-step workflows (file management, data entry, web browsing)
Recover from errors: detects unexpected popups, loading states, failures
Chain actions across multiple applications to complete high-level goals

Why It Matters

The bridge to physical robots

Friday isn't just a productivity tool. It's a proof of concept for a deeper idea: if an AI can perceive a visual environment, understand context, plan actions, and execute them through physical interfaces — that's exactly what a robot needs to do in the real world.

The perception-action loop in Friday (screen → understanding → plan → mouse/keyboard) maps directly to the loop a robot needs (cameras → understanding → plan → actuators). Building Friday was a stepping stone toward 20n.

Architecture

How it works

Screen capture layer: real-time screenshots at configurable intervals
Vision module: element detection, text extraction (OCR), layout understanding
Accessibility bridge: macOS AX API for precise UI tree traversal
Planning engine: LLM-powered task decomposition and action sequencing
Execution layer: synthetic input events (mouse moves, clicks, keystrokes)
Error recovery: state verification after each action, retry logic, alternative paths

Stack

Built with

Python Computer Vision OCR macOS Accessibility API GPT-4 / Claude PyAutoGUI AppKit

Links

→ GitHub