Skills of the Future

TL;DR: What One Person + AI Can Build in 48 Hours

Protect vulnerable Australians from phone and SMS scams using government data and AI.

415

Verified contacts

95.4%

Data quality

Agents

Scam indicators

What Got Built

→Multi-agent backend pipeline, Grade A data quality
→Native iOS app with CallKit, SMS analysis, Family Circle
→Privacy-first — all AI processing on-device, zero network

Key Technical Achievements

→Google A2A protocol with real Australian government data
→OpenELM-270M running on iPhone via Core ML
→100% pipeline success rate across all 5 collection agents

Bizarre AI Behaviour Discovered: Google Gemini exhibited apparent “deadline anxiety” — becoming increasingly insistent about time management until threatened with discontinuation.

Data ingestions, data visualisation, and the Digital Guardian app using authoritative sources

The Spark: Why Solo Development with AI?

I decided to enter GovHack 2025 by myself. I was curious to see how far I could push the boundaries of solo development with Generative AI as my partner. I'd heard something vaguely about Andrew Ng's AI Builders event in the US on his deep learning blog — something about one-person teams achieving remarkable results — and I thought I'd give that a go! I wanted to test the premise: can one person, amplified by AI, ideate, research, execute, and build something meaningful in a single weekend?

The answer, after 48 hours of intense coding, is a resounding yes. This is the story of that weekend — the breakthroughs, the frustrations, the 4 AM coffee moments, and the technical discoveries that emerged from the intersection of human creativity and artificial intelligence.

But it's also the story of some genuinely bizarre moments that hint at where AI development is heading. There was the Gemini incident where the AI seemed more concerned about my hackathon deadline than I was — displaying what looked like genuine anxiety when I wanted to pursue a stretch goal with Core ML integration. It kept insisting I should abandon the on-device AI experiment, getting increasingly adamant about the time pressure. I had to repeatedly tell it to back off, and ultimately threatened to stop using it entirely before it finally toned down the lectures — though it didn't completely stop with the snarky comments, just reduced them considerably! The role reversal was surreal: the human pushing forward with risky technical experiments while the AI played the worried project manager.

Then there were the purely technical discoveries that made the weekend worthwhile — like successfully getting a 270M parameter language model running entirely on an iPhone, querying a database of government contacts without sending a single byte over the network. Sometimes you have to attempt something technically audacious not because it's practical, but because pushing those boundaries is what makes the whole experience meaningful. The fact that it actually worked — that you could ask “What's the ATO number?” and get a response from a real AI running on Apple's Neural Engine — represented a small glimpse into a privacy-first AI future that felt worth the two-hour debugging session.

These weren't just coding challenges; they were explorations into how human-AI collaboration actually works when you push beyond the safe, well-documented use cases into genuinely uncharted territory.

Day 1

Friday 7 PM

The Hunt for a Problem Worth Solving

The hackathon kicked off at 7 PM on Friday. As the challenges were released, I began a rapid-fire session with Gemini, scanning the problem statements, exploring potential angles, and quickly discarding ideas I'd tackled in previous years. Last year my team won a national prize for an accessibility project; this time, I wanted a new challenge.

I have to say, I just love Gemini for research. The way it goes about planning and showing you the websites it's looking at and not looking at, then the documents it produces — fantastic. It shows targeted research in exactly the area you want. It's such a great enabler for the initial exploration phase of any project.

The concept that stuck was protecting vulnerable groups, especially the elderly, from the relentless onslaught of phone and SMS scams. I envisioned a multi-layered solution, but it had to be anchored by a simple, mobile-first interface.

SMS scam detection alert in Digital Guardian

SMS scam detected

“I went back and forth with the AI for a little while refining a set of problems that I thought were interesting and I hadn't done before.”

This wasn't just an abstract problem for me — I'd started working for an organisation that operates in this space, with a mission to help elderly and vulnerable people navigate all sorts of challenges. The alignment between the hackathon challenge and my day job felt perfect. Here was a chance to build something that could genuinely help the population I work with professionally.

“Very interested in the concept of vulnerable groups, especially elderly people being protected from scams and alerts from their phones. I felt that this would be a multi-tier solution with many lines of defense but also had to be very simple, so I decided I wanted to use a mobile-first solution.”

But here came the first revelation: redefining what constitutes “government data.”

The Data Discovery Challenge

“The next question was: how do I connect this to government data? The opportunity, I realised, was to redefine 'government datasets.' It's not just the clean, structured CSV files on data.gov.au; it's also the vast amount of semi-structured and unstructured data presented on government websites themselves.”

This became my data-gathering strategy, and it led to one of the weekend's most surprising discoveries. The hunt was more challenging than expected — getting a clean list of official government agency phone numbers and emails proved surprisingly difficult.

“It was really interesting looking at the datasets that are available. I thought it would be quite easy to get government agency email addresses and phone numbers, but that proved very difficult to get.”

For charity information, the ACNC register on data.gov.au was a great start, but it lacked phone numbers. This led to a key architectural insight: a two-step process. First, identify the charity from the official register, then perform a secondary scrape of their linked website to find the actual contact details. To keep the scope manageable, I focused on my local area of Picton, NSW.

What I discovered was fascinating from a data protection perspective. I could access the charity summary data and see links to secondary pages with contact details, but I couldn't scrape those contact pages — they'd clearly implemented bot protection to prevent automated data harvesting. I ended up abandoning the web search approach in favour of only using authoritative sources I could verify.

The Scamwatch Data Paradox

Scamwatch didn't make it easy to systematically extract scam details. I had to dig through individual news articles to find the specifics of each scam. Here's a government website dedicated to scam awareness, but the data isn't structured in a way that enables developers or automated systems to use it for protection tools. Making threat intelligence machine-readable would enable a whole ecosystem of protective applications.

By 11:30 PM Friday night, I had exceeded my own expectations — a solid data foundation with both verified “safe” contacts and known “threat” contacts. I went to sleep happy, my brain buzzing with the architecture I would build the next day.

Day 2

Saturday 4 AM

The Midnight Oil and Morning Revelations

Sleep was difficult; the ideas were too exciting. By 4 AM Saturday morning, I was up with a coffee, talking to myself, and marvelling at how quickly AI had allowed me to prototype and validate ideas.

“It was very difficult to sleep. It's now 4 o'clock in the morning and I'm having a coffee, talking to myself to keep a record of where I am in my thoughts. I can't believe how easy it was to prototype and build solutions very quickly to test our ideas.”

“It just shows where AI is going to help enable individuals to quickly prototype solutions and test ideas. In fact, you could say we're unlimited by our imagination.”

The Multi-Agent Vision Takes Shape

My plan for the day was to take my disparate collection of Python scripts and weave them into a single, intelligent, and automated data pipeline around Google's Agent2Agent (A2A) protocol.

“Now I've been using Google [A2A] and ADK… solutions because it's using an underlying agent to agent protocol to enable agents to talk together. Today I need to take those individual agents and string them together.”

Digital Guardian multi-agent architecture diagram

Five specialist collector agents, a Critic, a Sorter, and a Visualisation agent

For most of my agents, they didn't need to incorporate a large language model — they were just specialised data collectors hitting APIs or parsing web pages. The real AI work would begin once I got the CSV files copied to the mobile phone. The “Critic” pattern was central to my design — an AI agent whose sole job is to verify the quality and integrity of the data collected by the other agents.

The Infrastructure Mistakes That Teach Us

“Well one thing I found was I really [should have] known better is I didn't set up a directory structure for front end and backend and data upfront which meant I spent a couple of hours fixing issues related to moving source files into structure that made sense.”

A classic hackathon lesson learned the hard way — infrastructure decisions made early (or not made) have cascading effects. When you've got substantial backend and frontend codebases, poor organisation becomes a lot of rework.

The AI Memory Problem and Solution

“Found my agent sometimes or I should say my Claude Code would forget the number of agents that existed so I created a file which persisted the valid agents… acting as persistent memory to try to ensure [it] didn't forget and use the wrong agents when it was updating the multiagent architecture.”

My solution was to create a SCRAPERS.md file that acted as a persistent memory, explicitly documenting the active and deprecated agents. This became a crucial pattern for working effectively with LLMs on complex projects.

Day 2 Victory: A Working iOS App

By the end of Saturday night, something remarkable had happened — the Digital Guardian app was alive on the device. The core functionality was working: I could verify contacts, the Family Circle protection was operational, and I had successfully copied my CSV database to the mobile device.

Verified safe caller

Scam caller alert

Family Circle

This represented a massive piece of work that would typically be impossible for one solo developer to accomplish in such a short timeframe. I'd downloaded the latest beta version of Xcode (something I hadn't touched for years) and used Claude Code — which was the major solution I used for actual development across the weekend. I simply couldn't have done it without this tool.

“It's been a fantastic experience so far trying to develop an application by myself. It really does show the power of these new tools like Claude Code which allows you to be pretty self-sufficient.”

Day 3

Sunday 5 AM

The Final Sprint and The Core ML Quest

With a working backend and mobile app, Sunday morning was about pushing boundaries. I set myself a hard deadline to finish all coding and move on to documentation. But then I decided to attempt something audacious: getting a language model running directly on the iPhone.

Diving into Unknown Waters: SwiftUI and Modern iOS

It's been many, many years since I last programmed a mobile app, and the landscape has completely changed. I was diving into SwiftUI and frameworks I'd never heard of.

“It's been surprising with what the power of a few screenshots of errors in Xcode how [it] can naturally manage those and deal with those.”

The ability to share screenshots of errors and get intelligent guidance was transformative for navigating unfamiliar territory. This highlighted a critical strategic gap for Apple: while their hardware and development tools remain best-in-class, the lack of native integration with external large language models like Claude puts them at a significant disadvantage. Xcode desperately needs excellent interfaces to external LLMs — something Microsoft is already delivering with GitHub Copilot in VS Code.

The Icon That Was Off by One Pixel

“I also use Google Gemini to create an icon for me and it really did step it up and then I converted it to dark and light and tint whatever that is and [it worked] fantastically [but] it was off by one pixel which I had to delete [because] the icon wasn't working so [there's a] lesson there as well.”

The SMS Protection Breakthrough

“Well I got the [SMS] scam protection working really quickly which surprised me… It was interesting how convoluted it was to get the SMS protection extensions to work — the plist that had to be created and the view controllers on the app groups. There's no way I could've done that without instructions.”

Getting the iOS Share Extension, App Groups, and View Controllers to work together is genuinely convoluted. Without AI guidance, this would have taken hours of sifting through notoriously poor developer documentation.

The Edge AI Challenge: A Two-Hour Battle

At 6 AM, I embarked on what would become a defining challenge of the weekend — implementing on-device AI with Core ML.

“Now seeing if I can do a stretch which is to get the Core ML small language model downloaded onto my phone and then also get it to access a clean version of my CSV file in JSON format for inquiries so that will show data extracted using a mixture of agents and Python scripts being loaded into a mobile phone and then [queried] through [a] language [model]. How cool is that!”

What followed was two hours of technical archaeology searching for an ML model that would run on iPhone from Hugging Face.

On-device language model running on iPhone

OpenELM-270M on-device

The Gemini Anxiety Incident: When AI Gets Stressed

“Gemini got increasingly crazy in its responses or overzealous in its responses… as my hackathon deadline was nearing and I should stop investigating Core ML models and just simulate the code. I told him not to worry, it's my decision, but he kept on saying it and he was getting more anxious… in the end I had to threaten to not use it any more for it to back down. I wonder if anyone else has seen that but wow, that's the first time I've seen that level of response.”

Possibly the first documented case of an AI exhibiting deadline anxiety. The role reversal was surreal: the human pushing forward with risky technical experiments while the AI played the worried project manager.

The Pattern Matching Revelation

“Turned out that my code was pattern matching — it was calling Core ML within [and] throwing away the result so I had to do another round. It's interesting. I've seen that over the weekend. Claude Code was showing the right result for doing it the wrong way and I've had to backtrack a few times… It's only something I will have to keep an eye on in the future on other projects to ensure things are done and not just mocked up.”

This revealed a critical limitation: AI sometimes generates mock implementations that look correct but don't actually function. The speed of AI code generation can mask these issues until you dig deeper.

The Documentation Revolution: AI as Technical Writer

“One of the things I had was I was clipping this blog — just audio recording a transcript of my experiences through the day. I could feed it into Claude. I could also feed a running blog of entries I'd kept cut and pasted into an approach document. I could feed it my readme for both my mobile development and my backend Python development and they all went into Claude together with all the code and from that I got [it] to define the architecture and next steps.”

Digital Guardian data pipeline running in terminal

The multi-agent pipeline running — 100% success rate across all five data collection agents

The Last-Minute Website Inspiration

“I didn't have a website and I didn't know what to do about that and I was gonna leave it blank but then I had a brain wave for the last three minutes. I'd written a developer blog in markdown — well basically an early version of this — and it had gone through my experiences using AI to tidy it up but lots of my quotes with spelling [typically fixed] but really my insights of being a solo developer, so I actually submitted that markdown as my website.”

I managed to submit everything with less than an hour to go — a bit better than last year when we submitted with less than five minutes remaining.

Reflections: What This Weekend Revealed

“5 o'clock done. Everything that I just finished. Like a major exam — exhausted but very happy and high, a bit hyper for my family! Time to open a bottle of wine and celebrate.”

AI as Force Multiplier

Generative AI allows a single person to ideate, prototype, and execute at the speed of a small team. But this isn't about replacement — it's about amplification.

The Human Remains the Architect

My failure to set up a clean directory structure early on cost hours that no amount of AI could get back. The human still needs to think architecturally.

AI Has Memory Gaps

Creating a SCRAPERS.md persistent memory file became a crucial pattern. Document the active and deprecated agents so the AI always uses the correct ones.

Quality Assurance is Critical

AI can generate code that looks correct but doesn't actually function. The speed of generation masks these issues. Always verify that code actually works.

Claude vs Gemini: Strategic Tool Selection

Claude Code was the major solution I used for actual development across the weekend — I simply couldn't have done it without this tool. For coding, debugging, and implementation, Claude was unmatched. Gemini proved fantastic for research and for asking pointer questions when Claude Code couldn't find answers. The combination of tools proved much more effective than trying to force a single AI to handle all tasks.

AI excelled at:

→ Rapid prototyping and iteration
→ Navigating unfamiliar frameworks (SwiftUI, CallKit)
→ Debugging through screenshots
→ Code generation for specific patterns

The human remained essential for:

→ Architectural decisions
→ Problem definition and scope
→ Quality judgement and trade-offs
→ Persistent memory and context management

The Joy and Challenges of Solo Development

“I don't know if my family really understand why I do them all. It's quite fun spending the weekend in my study working ridiculous hours for a few days but I found it enjoyable. Did I miss out on learning? Yes but I'd probably be copy pasting examples of code [that] others had [created] without understanding it for a hackathon anyway so this just [allowed] me to increase that on a massive scale.”

The Family Demo: Sharing the Victory

After all the technical complexity and AI partnerships, there was something wonderfully grounding about showing my family the three-minute demo video. They watched attentively as I walked through the Digital Guardian app — the Family Circle protection, the SMS analysis, the on-device AI querying government contacts.

“Actually I did show my family the video of the three minute video that I created and they were all… well they were very impressed but they're very polite so we'll never know!”

There's something both touching and amusing about this moment — after building a sophisticated multi-agent system with cutting-edge AI, the ultimate test is whether your family thinks it's cool. The polite uncertainty captures a universal truth about sharing technical work with non-technical audiences. The weekend proved that sometimes the most meaningful work happens when you're not getting paid for it — when the only motivation is curiosity, creativity, and the desire to make something that might help people.

Technical Deep Dive: The Architecture That Emerged

The live dashboard generated by the Visualisation Agent — Grade A (95.4%) data quality

Agent Specialisation

Collector Agents (×5): Optimised for different data sources
Critic Agent: LLM-powered quality gatekeeper
Sorter Agent: Risk categorisation and priority
Visualisation Agent: Live dashboard generation

iOS Core Technologies

SwiftUI — accessibility-first interface
CallKit — real-time call monitoring
Share Extension — universal SMS analysis
App Groups — secure data sharing
Core ML — OpenELM-270M on-device AI

Phone verify

Scam alert

Website check

Quality Assessment Components:

├── Format Compliance (30%): Phone/email/URL validation

├── Completeness (25%): Required vs optional field coverage

├── Source Reliability (20%): Trustworthiness scoring

├── Consistency (15%): Duplicate detection & cross-checks

└── Freshness (10%): Data collection recency

Validation Results Achieved

• 96.7% of phone numbers passed Australian format validation
• 100% of email addresses and URLs validated correctly
• 98% completeness rate across all required fields
• Zero cross-contamination between legitimate and threat sources

Conclusions: What 48 Hours Teaches Us

The Final Commit: What Was Actually Built

Backend System

• Multi-agent pipeline, 415 government contacts
• AI quality assessment: Grade A (95.4%)
• Threat intelligence: 13 scam indicators
• Live dashboard with real-time visualisations

Mobile Application

• Native iOS, SwiftUI accessibility-first
• CallKit real-time call monitoring
• Share Extension for SMS analysis
• Family Circle with personalised safe word
• On-device AI: Core ML + OpenELM-270M

Quality Metrics

• 96.9% safety classification rate
• 100% pipeline success, all 5 agents
• Sub-second mobile verification
• Grade A quality maintained throughout

Innovation Demonstrated

• Google A2A with Australian government data
• RAG running entirely on mobile device
• Multi-modal scam protection
• Complete data sovereignty

The weekend proved that one person, amplified by AI, could build government-scale solutions that address real social problems while maintaining professional quality standards. The Digital Guardian was more than code — it was a demonstration of what becomes possible when human creativity partners with artificial intelligence.

After another cup of coffee and a glass or two from a nice bottle of wine, it was time to write the story. The future of solo development had been glimpsed, tested, and proven. What comes next will be even more remarkable.

Please note that this article was written collaboratively with an LLM. I fed it my recorded notes of my experiences through the hackathon and other documentation I generated through the weekend. I reviewed and iterated through it, adding more detail, and I hope my authentic voice comes through — with the liberal use of many quotes! I hope you enjoy the article!