turn old iPhones into AI agents
give it a goal in plain English. it reads the screen, thinks, taps, and repeats — via iOS accessibility APIs.
$ swift run Agent.swift
enter your goal: open music and play "lofi hip hop"
--- step 1/30 ---
think: i'm on the home screen. launching music.
action: launchApp (842ms)
--- step 2/30 ---
think: music is open. tapping search.
action: tap (623ms)
--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)
--- step 4/30 ---
action: pressReturn (389ms)
--- step 5/30 ---
think: playlist showing. done.
action: done (412ms)
⚙️ perceive, reason, act, adapt
every step is a loop. dump the accessibility tree, filter elements, send to an LLM, execute via Accessibility APIs.
1. perceive
captures the screen via AXIsProcessTrusted and accessibility hierarchy: labels, values, frames.
2. reason
sends screen state + goal to an LLM (optimized for iOS UI). returns think, plan, action.
3. act
executes via XCUIRemote / CGEvent (Mac) or private APIs – tap, type, swipe, Siri, home button.
4. adapt
stuck recovery after 3 unchanged steps. Vision framework fallback if accessibility empty.
🔁 interactive, workflows, or flows
interactive
describe what you want. the agent figures out the rest.
$ swift run Agent.swift
enter your goal: send "running late" to Mom on whatsapp
workflows (JSON + AI)
chain goals across apps, handles popups & UI changes.
{
"name": "weather to messages",
"steps": [
{ "bundleId": "com.apple.weather",
"goal": "check chennai weather" },
{ "goal": "share to Mom in Messages" }
]
}
flows (YAML, no AI)
fixed taps & types, instant execution.
appId: com.apple.MobileSMS
name: Send iMessage
- launchApp
- tap: "Mom"
- type: "hello from iClaw"
- tap: "Send"
| workflows (AI) | flows (deterministic) |
| format | json + llm | yaml, no llm |
| adapt to UI change | ✅ yes | ❌ breaks |
| speed | slower (calls) | instant |
| best for | complex multi-app | simple repeatable |
🚀 what you can build
📱 delegate to on‑device AI — use ChatGPT, Perplexity, Shortcuts as tools. no extra API keys.
🌐 remote control via Tailscale/SSH — control your iPhone from anywhere, cron workflows.
🔄 old devices, always on — turn that drawer iPhone into a standup poster, flight checker, digest sender.
⚡ things it can do right now
💬 messaging
- ✉️ iMessage to contacts
- 📨 reply to latest SMS
- 📧 compose via Mail
- 💬 Telegram / Slack posts
🔍 research
- 🌐 Safari search & collect
- 🤖 ask ChatGPT / Perplexity
- 📊 weather, stocks, flights
- 🏷️ price comparison
📱 social
- 📸 Instagram, X posts
- ❤️ like / comment
- 📈 engagement metrics
📋 productivity
- ☀️ morning briefing
- 📅 calendar events
- 📝 Apple Notes capture
- 🔔 triage notifications
🎵 lifestyle
- 🍔 food delivery
- 🚗 Uber booking
- 🎧 Apple Music playlist
- 🧘 toggle Focus mode
⚙️ device control
- 📶 toggle Wi‑Fi, Bluetooth
- 🔊 volume / brightness
- 📲 install / uninstall apps
- ⚡ run Shortcuts
🧪 what works & what doesn’t
✅ works well
native Apple apps, multi‑app workflows, system settings via Shortcuts, stuck recovery, Vision fallback.
⚠️ unreliable
games (Metal rendering), webviews, drag & drop, notification interaction, clipboard on some iOS.
❌ can't do
banking apps with secure text, Face/Touch ID, bypass lock screen, other apps' private data, camera stream, pinch‑to‑zoom.
⚡ getting started
1. install
curl -fsSL https://iclaw.ai/install.sh | sh
or manually:
xcode-select --install
git clone https://github.com/unitedbyai/iclaw.git
cd iclaw && swift build
cp .env.example .env
2. configure LLM
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key
or ollama (local), openrouter, openai, bedrock.
3. connect iPhone
instruments -s devices
# grant accessibility when prompted
swift run Agent.swift
| provider | cost | vision |
| groq | free | no |
| ollama | free (local) | yes* |
| openrouter | per token | yes |
| openai | per token | yes |
| bedrock | per token | yes |
optional tune
| key | default | what |
| MAX_STEPS | 30 | steps before giving up |
| STEP_DELAY | 2 | seconds between actions |
| STUCK_THRESHOLD | 3 | steps before stuck recovery |
| VISION_MODE | fallback | off/fallback/always |
| MAX_ELEMENTS | 40 | UI elements sent to LLM |
📦 35 workflows + 5 flows
messaging (10) — slack-standup, imessage-broadcast, telegram-send…
social (4) — social-media-post, instagram-check…
productivity (8) — morning-briefing, github-check-prs…
research (6) — weather-to-imessage, price-comparison…
lifestyle (8) — food-order, apple-music-playlist…
flows (5) — send-imessage, safari-search, toggle-wifi…
📁 10 files in src/
Agent.swift
Actions.swift
Skills.swift
Workflow.swift
Flow.swift
LLMProviders.swift
Sanitizer.swift
Config.swift
Constants.swift
Logger.swift