Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Supervisor
We’re bringing again Androidify later this 12 months, this time powered by Google AI, so you’ll be able to customise your very personal Android bot and share your creativity with the world. Immediately, we’re releasing a brand new open supply demo app for Androidify as an important instance of how Google is utilizing its Gemini AI fashions to boost app experiences.
On this publish, we’ll dive into how the Androidify app makes use of Gemini fashions and Imagen through the Firebase AI Logic SDK, and we’ll present some insights discovered alongside the best way that will help you incorporate Gemini and AI into your personal initiatives. Learn extra in regards to the Androidify demo app.
App movement
The general app capabilities as follows, with varied components of it utilizing Gemini and Firebase alongside the best way:

Gemini and picture validation
To get began with Androidify, take a photograph or select a picture in your machine. The app must ensure that the picture you add is appropriate for creating an avatar.
Gemini 2.5 Flash through Firebase helps with this by verifying that the picture incorporates an individual, that the individual is in focus, and assessing picture security, together with whether or not the picture incorporates abusive content material.
val jsonSchema = Schema.obj( properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()), optionalProperties = listOf("error"), ) val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()) .generativeModel( modelName = "gemini-2.5-flash-preview-04-17", generationConfig = generationConfig { responseMimeType = "utility/json" responseSchema = jsonSchema }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE), ), ) val response = generativeModel.generateContent( content material { textual content("You might be to investigate the offered picture and decide whether it is acceptable and applicable based mostly on particular standards.... (extra particulars see the complete pattern)") picture(picture) }, ) val jsonResponse = Json.parseToJsonElement(response.textual content) val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content material
Within the snippet above, we’re leveraging structured output capabilities of the mannequin by defining the schema of the response. We’re passing a Schema object through the responseSchema param within the generationConfig.
We wish to validate that the picture has sufficient data to generate a pleasant Android avatar. So we ask the mannequin to return a json object with success = true/false and an non-compulsory error message explaining why the picture does not have sufficient data.
Structured output is a strong function enabling a smoother integration of LLMs to your app by controlling the format of their output, just like an API response.
Picture captioning with Gemini Flash
As soon as it is established that the picture incorporates ample data to generate an Android avatar, it’s captioned utilizing Gemini 2.5 Flash with structured output.
val jsonSchema = Schema.obj( properties = mapOf( "success" to Schema.boolean(), "user_description" to Schema.string(), ), optionalProperties = listOf("user_description"), ) val generativeModel = createGenerativeTextModel(jsonSchema) val immediate = "You might be to create a VERY detailed description of the primary individual within the given picture. This description will likely be translated right into a immediate for a generative picture mannequin..." val response = generativeModel.generateContent( content material { textual content(immediate) picture(picture) }) val jsonResponse = Json.parseToJsonElement(response.textual content!!) val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content material
The opposite possibility within the app is to begin with a textual content immediate. You’ll be able to enter in particulars about your equipment, coiffure, and clothes, and let Imagen be a bit extra inventive.
Android technology through Imagen
We’ll use this detailed description of your picture to complement the immediate used for picture technology. We’ll add further particulars round what we wish to generate and embody the bot coloration choice as a part of this too, together with the pores and skin tone chosen by the consumer.
val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic fashion, the pose is relaxed and easy, going through immediately ahead [...] The bot appears to be like as follows $userDescription [...]"
We then name the Imagen mannequin to create the bot. Utilizing this new immediate, we create a mannequin and name generateImages:
// we provide our personal fine-tuned mannequin right here however you should utilize "imagen-3.0-generate-002" val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel( "imagen-3.0-generate-002", safetySettings = ImagenSafetySettings( ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE, personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL, ), ) val response = generativeModel.generateImages(imagenPrompt) val picture = response.pictures.first().asBitmap()
And that’s it! The Imagen mannequin generates a bitmap that we will show on the consumer’s display screen.
Finetuning the Imagen mannequin
The Imagen 3 mannequin was finetuned utilizing Low-Rank Adaptation (LoRA). LoRA is a fine-tuning method designed to cut back the computational burden of coaching massive fashions. As a substitute of updating your complete mannequin, LoRA provides smaller, trainable “adapters” that make small adjustments to the mannequin’s efficiency. We ran a advantageous tuning pipeline on the Imagen 3 mannequin typically obtainable with Android bot property of various coloration combos and totally different property for enhanced cuteness and enjoyable. We generated textual content captions for the coaching pictures and the image-text pairs have been used to finetune the mannequin successfully.
The present pattern app makes use of a typical Imagen mannequin, so the outcomes might look a bit totally different from the visuals on this publish. Nonetheless, the app utilizing the fine-tuned mannequin and a customized model of Firebase AI Logic SDK was demoed at Google I/O. This app will likely be launched later this 12 months and we’re additionally planning on including assist for fine-tuned fashions to Firebase AI Logic SDK later within the 12 months.

ML Package
The app additionally makes use of the ML Package Pose Detection SDK to detect an individual within the digital camera view, which triggers the seize button and provides visible indicators.
To do that, we add the SDK to the app, and use PoseDetection.getClient(). Then, utilizing the poseDetector, we have a look at the detectedLandmarks which are within the streaming picture coming from the Digital camera, and we set the _uiState.detectedPose to true if a nostril and shoulders are seen:
non-public droop enjoyable runPoseDetection() { PoseDetection.getClient( PoseDetectorOptions.Builder() .setDetectorMode(PoseDetectorOptions.STREAM_MODE) .construct(), ).use { poseDetector -> // Since picture evaluation is processed by ML Package asynchronously in its personal thread pool, // we will run this immediately from the calling coroutine scope as an alternative of pushing this // work to a background dispatcher. cameraImageAnalysisUseCase.analyze { imageProxy -> imageProxy.picture?.let { picture -> val poseDetected = poseDetector.detectPersonInFrame(picture, imageProxy.imageInfo) _uiState.replace { it.copy(detectedPose = poseDetected) } } } } } non-public droop enjoyable PoseDetector.detectPersonInFrame( picture: Picture, imageInfo: ImageInfo, ): Boolean { val outcomes = course of(InputImage.fromMediaImage(picture, imageInfo.rotationDegrees)).await() val landmarkResults = outcomes.allPoseLandmarks val detectedLandmarks = mutableListOf() for (landmark in landmarkResults) { if (landmark.inFrameLikelihood > 0.7) { detectedLandmarks.add(landmark.landmarkType) } } return detectedLandmarks.containsAll( listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER), ) }

Get began with AI on Android
The Androidify app makes an intensive use of the Gemini 2.5 Flash to validate the picture and generate an in depth description used to generate the picture. It additionally leverages the particularly fine-tuned Imagen 3 mannequin to generate pictures of Android bots. Gemini and Imagen fashions are simply built-in into the app through the Firebase AI Logic SDK. As well as, ML Package Pose Detection SDK controls the seize button, enabling it solely when an individual is current in entrance of the digital camera.
To get began with AI on Android, go to the Gemini and Imagen documentation for Android.
Discover this announcement and all Google I/O 2025 updates on io.google beginning Might 22.