Posted by Terence Zhang – Developer Relations Engineer and Kristi Bradford – Product Manager

Google Pixel’s Recorder app allows people to record, transcribe, save, and share audio. To make it easier for users to manage and revisit their recordings, Recorder’s developers turned to Gemini Nano, a powerful on-device large language model (LLM). This integration introduces an AI-powered audio summarization feature to help users more easily find the right recordings and quickly grasp key points.

Earlier this month, Gemini Nano got a power boost with the introduction of the new Gemini Nano with Multimodality model. The Recorder app is already leveraging this upgrade to summarize longer voice recordings, with improved processing for grammar and nuance.

Meeting user needs with on-device AI

Recorder developers initially experimented with a cloud-based solution, achieving impressive levels of performance and quality. However, to prioritize accessibility and privacy for their users, they sought an on-device solution. The development of Gemini Nano presented a perfect opportunity to build the concise audio summaries users were looking for, all while keeping data processing on the device.

Gemini Nano is Google’s most efficient model for on-device tasks. “Having the LLM on-device is beneficial to users because it provides them with more privacy, less latency, and it works wherever they need since there’s no internet required,” said Kristi Bradford, the product manager for Pixel’s essential apps.

To achieve better results, Recorder also fine-tuned the model using data that matches its use case. This is done using low order rank adaptation (LoRA), which enables Gemini Nano to consistently output three-bullet point descriptions of the transcript that include any speaker names, key takeaways, and themes.

AICore, an Android system service that centralizes runtime, delivery, and critical safety components for LLMs, significantly streamlined Recorder’s adoption of Gemini Nano. The availability of a developer SDK for running GenAI workloads allowed the team to build the transcription summary feature in just four months, with only four developers. This efficiency was achieved by eliminating the need for maintaining in-house models.

Since its release, Recorder users have been using the new AI-powered summarization feature averaging 2 to 5 times daily, and the number of overall saved recordings increased by 24%. This feature has contributed to a significant increase in app engagement and user retention overall. The Recorder team also noted that feedback about the new feature has been positive, with many users citing the time the new AI-powered summarization feature saves them.

“We were surprised by how truly capable the model was… before and after LoRA tuning.” — Kristi Bradford, product manager for Pixel’s essential apps

The next big evolution: Gemini Nano with multimodality

Recorder developers also implemented the latest Gemini Nano model, known as Gemini Nano with multimodality, to further improve its summarization feature on Pixel 9 devices. The new model is significantly larger than the previous one on Pixel 8 devices, and it’s more capable, accurate, and scalable. The new model also has expanded token support that lets Recorder summarize much longer transcripts than before. Gemini Nano with multimodality is currently only available on Pixel 9 devices.

Integrating Gemini Nano with multimodality required another round of fine-tuning. However, Recorder developers were able to use the original Gemini Nano model’s fine-tuning dataset as a foundation, streamlining the development process.

To fully leverage the new model’s capabilities, Recorder developers expanded their dataset with support for longer voice recordings, implemented refined evaluation methods, and established launch criteria metrics focused on grammar and nuance. The inclusion of grammar as a new metric for assessing inference quality was made possible solely by the enhanced capabilities of Gemini Nano with Multimodality.

UI example

Doing more with on-device AI

“Given the novelty of GenAI, the whole team had fun learning how to use it,” said Kristi. “Now, we’re empowered to push the boundaries of what we can accomplish while meeting emerging user needs and opportunities. It’s truly brought a new level of creativity to problem-solving and experimentation. We’ve already demoed at least two more GenAI features that help people get time back internally for early feedback, and we’re excited about the possibilities ahead.”

Get started

Learn more about how to bring the benefits of on-device AI with Gemini Nano to your apps.



Source link

By admin

Related Post