PhysiClaw crab mascot

PhysiClaw

The AI that physically uses your phone.

Order food, hail rides, check your bank, reply to messages.
Any app, any phone, no APIs, no OAuth.

Claude Desktop

Pizza for dinner on Uber Eats

On it. I'll open Uber Eats on your phone, find a pizza place nearby, and place the order.

screenshot_top()
move("down-right", "large")
tap() — Uber Eats icon
screenshot_top()
searching for pizza...
Robot arm touching phone screen

The entire flow runs autonomously. You just watch.

the loop
 Top Camera ──→ AI Agent ──→ 3-Axis Arm ──→ Side Camera ──→ Aligned?
 (read screen)  (decide)     (move stylus)   (check pos)     │
      ▲                                                  Yes │ No
      │                                                   │  │
      │         Touch Phone ◄─────────────────────────────┘  │
      │              │                                       │
      └──────────────┘ (next action)         adjust & retry ◄┘
>

How it works

1

Park & Screenshot

Stylus retracts out of frame. Top camera takes a clean screenshot. The AI sees exactly what's on your phone screen.

2

AI Decides

The agent (Claude, GPT, etc.) analyzes the screen and picks a direction and distance — like "move down-right, large" — no pixel coordinates needed.

3

Move & Verify

The arm moves the stylus. Side camera checks alignment from 45°. Not aligned? The AI adjusts. Aligned? It taps.

4

Touch & Repeat

Stylus touches the screen, then retracts. Top camera verifies the result. The loop continues for the next action.

📷

Top Camera

Looks straight down at the screen — reads what's on it

📐

Side Camera

Views from ~45° — checks stylus tip position before tapping

✏️

Capacitive Stylus

Moves X/Y to reach any point, up/down (Z) to touch or release

>

Why PhysiClaw

Today's AI agents can control your computer — but they hit walls everywhere.

The Problem

Want to order food? Need a delivery API + OAuth.
Want to check your bank? Blocked by data walls.
Want to book a ride? Another service integration.
Every new app = new OAuth, new API, new setup.

The PhysiClaw Way

One setup. Every app. Any phone.
No API keys. No OAuth. No integrations.
Invisible to apps — just a finger on glass.
iOS and Android. No jailbreak needed.

The insight: Instead of building software bridges to every service, give the AI agent physical presence. A camera sees the screen. A robotic finger taps it. No app can detect or block it — because to the phone, it's just a finger.

>

System Architecture

architecture.txt
┌───────────────────────────────────────┐
│           AI Agent (Brain)            │
│  Claude Desktop / OpenClaw / etc.     │
│  Sees screen → decides → calls tools  │
└──────────────────┬────────────────────┘
                   │ MCP Protocol
                   ▼
┌───────────────────────────────────────┐
│     PhysiClaw MCP Server (Python)     │
│                                       │
│  Tools:                               │
│   · screenshot_top   (top camera)     │
│   · screenshot_side  (side camera)    │
│   · move             (X/Y plane)      │
│   · tap / swipe      (Z down + move)  │
│   · park             (retract)        │
└──────────┬────────────────┬───────────┘
           │                │
     USB Cameras      USB Serial (GRBL)
           │                │
           ▼                ▼
    ┌────────────┐   ┌───────────────┐
    │ Top Camera │   │ GRBL Board    │
    │ (screen)   │   │ (embedded)    │
    ├────────────┤   │ X/Y gantry    │
    │ Side Camera│   │ Z stylus      │
    │ (stylus)   │   └──────┬────────┘
    └────────────┘          │ touch
                            ▼
                   ┌─────────────────┐
                   │  Phone          │
                   │  (unlocked)     │
                   └─────────────────┘
Python 3.11+ MCP Protocol GRBL/grblHAL OpenCV pyserial USB Serial
>

Hardware

Everything you need to build one. Total cost: ~$127.

Component Price
GRBL Arm ~$80
Top Camera ~$14
Side Camera ~$14
Stylus ~$1.50
Camera Mounts ~$4
Phone Mount ~$1.20
USB Hub ~$13
>

Quick Start

Install dependencies
pip install pyserial opencv-python mcp
Clone the repo
git clone https://github.com/echosprint/PhysiClaw.git
cd PhysiClaw
Project structure
physiclaw/
├── physiclaw_server.py   # MCP Server entry point
├── hand.py               # Serial G-code control
├── eyes.py               # USB camera capture
└── config.py             # Configuration constants
>

What it can do

Any app, any phone, iOS or Android.

🍜
Order food delivery
Meituan, Uber Eats
🚗
Hail a ride
Didi, Uber
🛒
Browse & shop
Taobao, Amazon
📊
Check weather, news, stocks
Any app
💬
Read & reply to messages
WeChat, WhatsApp
📱
Scroll social media
TikTok, Instagram
App daily check-ins
Collect rewards
Set alarms & reminders
Clock, Calendar
📸
Take & send screenshots
Any app
>

Gestures

Single Tap

Move to target → down 50-100ms → up

G0 → M3 S12 → G4 P0.05 → M3 S0

Long Press

Move to target → down 800ms → up

G0 → M3 S12 → G4 P0.8 → M3 S0

Swipe

Down at start → linear move → up at end

M3 S12 → G1 F3000 → M3 S0

Double Tap

Two taps with <300ms interval

tap → G4 P0.1 → tap