Leading Through Ambiguity: Lessons From Building With AI ‘Autonomously’
TL;DR:
New Claude Code plugins are finally good enough to change how designers think about autonomy
Automation loops break less on tech and more on human expectation
Building Trip Genie taught me why “almost working” tools will reshape how designers work and think
Early in 2025, I began experimenting with IDEs like Cursor, returning to “coding” after years spent in UX. A year ago, an engineer friend urged me to suspend disbelief and trust that the tools would eventually let designers code like pros. In recent weeks, Claude Code’s plugins brought us one step closer to autonomous design and development. Those early versions of Cursor weren’t very user-friendly, I was able to get a start but quickly grew frustrated with manual constraints. A short year later, everything is exponentially better, but not perfect, and it’s making it obvious where the future of design is going.
Learning by Doing
I recently applied a Claude Code, Codex, agentic workflow to a persistent frustration I kept running into with ChatGPT. The use case was simple: Today you can create a travel plan (or any document), but it ends up buried in the chat history, lost forever in the scroll. That’s a real UX gap in today’s chat-based AI. Travel is also deeply visual, and plain text interfaces have zero panache. I wanted to change that. The result was an AI-powered trip advisor I dubbed Trip Genie, and it quickly revealed how far tools like Cursor and Claude can be pushed and how powerful they’ve become.
This visual itinerary is built with a React frontend, styled with custom CSS, and backed by Supabase for authentication and saved trips. The chat flow uses OpenAI’s API to generate and update itineraries, while Google Maps renders the map view and pins. The UI is split into a chat panel and itinerary panel so people can iterate on plans in real time, and each itinerary version is stored so edits and refreshes feel safe. Destination photos courtesy of Unsplash. Yes, I did that without writing a line of code.
I envisioned an itinerary that is both informative and visually fun
The Concept
You ask AI for an itinerary for a trip you’re planning. It gives you something great. New places and restaurants to discover. You feel a spark of excitement. Then the itinerary just dies. It lives in a chat thread, maybe gets pasted into a doc, and never evolves. There’s no continuity. No memory. No way to keep working with it like a living thing.
I kept thinking: why can’t the itinerary itself be the thing you keep talking to? Why shouldn’t it remember preferences, accept updates, and change as you go?
That question became Trip Genie.
Yes, tools like this probably already exist. But one of the strange side effects of the current AI moment is realizing you can build very specific tools for your own brain. Once you see that, it’s hard to justify using tools designed for everyone else.
So I decided to build it. Here’s what I learned.
I reversed roles between user and system, by letting the system lead with questions. This works because it’s a narrow use-case.
Workflow Learnings
1. Make it real as fast as possible
Getting used to IDE workflows, without the niceties of Figma, can be an uphill climb. The tools themselves are evolving quickly, and even while testing locally and going in circles with Claude, I followed a rule I’ve lived by most of my career:
Make it real as fast as possible, and you can test, learn and iterate from there. With AI that’s easier than ever, and it’s freeing up designers to bring their special sauce. More and more that’s our taste, intuition and our ability to facilitate decision making through design. And anyway, I figured building something I really, really want out there is a great way to test the boundaries of the tools.
I described the key user tasks in the prompt, focusing on the user’s journey from point A to point B; a hypothesis. I mocked up a placeholder logo, and prompted the first iterations of Trip Genie. Within minutes, the scaffolding of what I envisioned existed. The project stopped feeling like an experiment and started feeling real. The energy changed. I cared more. I pushed further.
That’s when I knew I was committed.
2. Accept that the tools are powerful and flawed at the same time
My initial plan was simple and, in hindsight, slightly flawed.
I wanted to give Claude a UX plan, hand over the work, and let it run. In my head, it looked like a mostly autonomous loop. I’d step away, come back later, and find meaningful progress waiting for me.
This markdown file contains my dreams for what the system should be creating. Dream big, or go home!
In reality, Chat Agents, Claude Code and the more recent Claude Chrome extension, may not share data in Cursor but don’t share everything, but their activation paths are different and not always obvious. Knowing which tool is “listening,” when, and in what context isn’t intuitive. That gap matters more than I expected.
Cursor is flawed but the answers are not far away. The chat assistant is embedded directly in the UI, which makes everything feel askable. Stuck on a terminal command? Ask Claude. Unsure what a file does? Ask Claude. The boundary between the problem and the solution mostly disappears.
These three panels are my new design tools
Early on, the visual editor wasn’t there yet. You couldn’t really edit designs using tokens, grids, or structured UI primitives like you can now. It’s still early days though, so the tools are still changing rapidly. But that evolution is the point. Design and engineering are being pulled closer together, and Cursor is actively collapsing that gap.
Knowing that, led to a more important realization.
These tools aren’t built for mass consumption yet. They’re powerful under the hood, but rough at the edges. Too much responsibility still falls on the user to understand how pieces connect, when context carries over, and when it doesn’t. Good design will eventually smooth those seams. For example, Cursor still uses teeny, tiny, undiscoverable icons to change the view in the visual browser. For now, the magic is real, but it still demands patience, curiosity, and a tolerance for friction.
These itty-bitty icons will change the way you design.
3. When you’re stuck, rethink, research, or re‑imagine
Anthropic’s new plugin ecosystem made it possible to run longer automation loops inside Cursor, so I experimented with /ralph-wiggum:ralph-loop expecting steady, hands‑off progress. I eventually got it running, but in practice the loop kept stalling or drifting, and I’d return to a blank screen instead of momentum. It became obvious that long loops only shine when the task is stable and tightly defined, and my workflow was anything but. The ideas were still forming, the direction kept shifting, and each iteration changed what I actually wanted to build.
So I pivoted. I used Claude’s plan mode to sketch the roadmap, then moved into a phased approach using the chat and the Chrome plugin. That gave me far more control over pacing, let me fold in the unexpected ideas Claude surfaced, and helped me spot technical limits before sinking time into the wrong path. Once I stopped forcing the loop and started working in deliberate phases, the project finally moved forward.
And of course, this is just one way to work. Some folks may thrive on fully automated runs; others prefer hands-on steering. I landed somewhere in the middle, enough structure to keep things moving, enough control to shape the outcome.
4. I have to remind myself regularly that I am the human‑in‑the‑loop
At one point, I realized I’d quietly become the passive message bus between systems, giving free rides between GPTs, stitching together plans, prompts, and partial executions. The overhead wasn’t thinking about the product anymore; it was managing coordination. Hours slipped by circling the same technical constraints.
The Claude Chrome plugin was a perfect example. At first, I couldn’t even tell if it was working. Cursor offered almost no feedback, so I kept firing prompts into the void, wondering whether anything was actually happening. That’s when I stopped, did some research to figure out what the browser control actually looks like, and restarted Claude Chrome and tried again. Then, suddenly, my browser glowed, and Claude began interacting directly with my product inside the browser. Watching the system poke at the interface, run a self‑test, and respond to real state changes felt like a tiny moment of magic.
That’s when I stopped asking, “Why isn’t this fully autonomous?” and started asking a better question: Am I actually moving faster, or am I just managing the AI? What can I change in my setup or my expectations, to get unstuck?
Those questions helped me take back the reins, accept the system’s limits, and work with it instead of around it. Different people will want different levels of control in these hybrid workflows, some crave full automation, others prefer hands‑on steering. I’m learning to live in the middle, where the system feels alive but I still get to decide the pace.
5. What is happening under the hood isn’t always obvious
My original idea was for Trip Genie to initiate the conversation, not the user. “Where are you traveling?” it might ask. Within a narrow use case like travel planning, intent could be assumed. The system could lead.
What I didn’t fully realize was that I wasn’t building a conversation at all. I was writing a script that was being hardcoded. When I went off-script during testing, the system broke. Because I wasn’t deeply inside the code, it wasn’t always clear what was hardcoded versus what was actually connected to the system. The experience felt conversational on the surface, but underneath it was brittle.
When I routed everything through the OpenAI API, the conversation immediately felt cleaner and more fluid. But that change exposed a new problem. The conversation and the itinerary drifted apart. Data wasn’t being passed reliably, so user responses never actually impacted the revised itinerary. In that state, Claude still happily narrated success.
Once I accepted that I couldn’t always trust the narrator or tell what was actually connected to the system and what was just appearances, I redesigned the human-in-the-loop workflow to surface that distinction as early as possible.
What I actually learned
Autonomy isn’t a switch you flip. It’s shaped by the human in charge. Models can reason endlessly, and sometimes their responses are misleading. Iteration is easier than ever, but with all the options available, the human is the key decision-engine.
The real bottleneck wasn’t intelligence or creativity. It was coordination, in a reality that is constantly evolving. And building something real, even when it’s messy or half-working, taught me about the state of the tools then any prompt tuning could.
Where Trip Genie is now
Trip Genie is live as an MVP, and it’s pretty basic. It runs. It mostly works. It still needs better preference memory and cleaner data flow between conversation and itinerary and it probably has more bugs than I’ve discovered. But it feels alive and it can be shown to actual humans for feedback, insights and innovation. This human connection is why I do UX, and It’s a joy to spend time iterating on immediate data from real uses.
Most importantly, I now understand where to push automation and where to step in without fighting the shifting state of the tools. When the tools are constantly changing, learning by experimenting is key. As I type this, Claude is controlling my Chrome browser in the background and adding WayGenie to Google’s search console and optimizing SEO, freeing me to do the stuff like to do, like writing this blog.
That understanding is the real product I walked away with.