At the University of Sydney, the CS society (SYNCS) runs an annual hackathon where you have ~24 hours to build a open ended project.
In 2022, I competed in the hackathon in a team of four to build an AI image generator into Canva via their extension API. We won first place, yay! 🎉
The extension was not publicly released during the hackathon (and there is no point now, for reasons discussed layer 😂). However, you can still view the devpost and source code.
I am quite an avid follower of AI-related news, so I was notified when Stable Diffusion released their first open source generative AI image model on 22nd August 2022. This was essentially the first high quality AI image generator that was accessible via an API, so I figured it would be the perfect base for a hackathon project. The hackathon was less than a week later, so it was basically cutting edge tech at the time (i.e. a bonus WOW factor during the presentation).
For context, DALL-E 2 was released in April and of comparable quality, but their API would not be released until early November.
The particular idea we landed on is a Canva extension which is outlined below in our demo video. I won't take full credit for this though as it was heavily inspired by a discussion on tech Twitter of theoretical use cases for image-based generative AI within Google Slides.
It's possible that the reason we were called team non-compete was due to the fact I had recently signed a contract to intern at Canva in the summer holidays of 2022. But it's ok, because surely Canva would not try to compete in that space for a while...
Anyway, Canva released a nearly identical beta feature into their application just a few weeks later. The idea was probably obvious to many people, we just executed first thanks to the hackathon.
While the core idea was relatively low hanging fruit (AI generated images in presentation software), I do think there was some novelty in the execution. In particular, the use of a sidebar filled with predefined modifier controls was not something I had seen Canva or others implement just yet.
At the time, AI image generation was relatively new and so required an new kind of skill called "prompting". While prompting, I expect the average user (myself included) would probably not be creative enough to think of adding "in the style of Van Gogh" to the end of their prompt unless they had previously seen the technique.
We figured that adding a sidebar filled with controls like an "Image style" dropdown would guide the user towards possible modifications for their image after they type the core objects into the prompt box (e.g. a cat at the Colosseum). On the backend, we simply append strings to their prompt based on the controls they select and then hand it off to the AI.
In my opinion, this UX layer between the prompt and the user was an obvious next step for generative AI. What I did not predict, was that this layer could be very effectively replaced by an LLM, which we later saw when DALL-E 3 was integrated with GPT-4 in October of 2023. I am not as sure anymore what the future interface of AI image generation looks like (possibly something like this), but I sure am excited to see.