How Androidify leverages Gemini, Firebase and ML Package

May 27, 2025

59

Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Supervisor

We’re bringing again Androidify later this 12 months, this time powered by Google AI, so you’ll be able to customise your very personal Android bot and share your creativity with the world. Immediately, we’re releasing a brand new open supply demo app for Androidify as an important instance of how Google is utilizing its Gemini AI fashions to boost app experiences.

On this publish, we’ll dive into how the Androidify app makes use of Gemini fashions and Imagen through the Firebase AI Logic SDK, and we’ll present some insights discovered alongside the best way that will help you incorporate Gemini and AI into your personal initiatives. Learn extra in regards to the Androidify demo app.

App movement

The general app capabilities as follows, with varied components of it utilizing Gemini and Firebase alongside the best way:

flow chart demonstrating Androidify app flow

Gemini and picture validation

To get began with Androidify, take a photograph or select a picture in your machine. The app must ensure that the picture you add is appropriate for creating an avatar.

Gemini 2.5 Flash through Firebase helps with this by verifying that the picture incorporates an individual, that the individual is in focus, and assessing picture security, together with whether or not the picture incorporates abusive content material.

val jsonSchema = Schema.obj(
   properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()),
   optionalProperties = listOf("error"),
   )
   
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI())
   .generativeModel(
            modelName = "gemini-2.5-flash-preview-04-17",
   	     generationConfig = generationConfig {
                responseMimeType = "utility/json"
                responseSchema = jsonSchema
            },
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE),
    	),
    )

 val response = generativeModel.generateContent(
            content material {
                textual content("You might be to investigate the offered picture and decide whether it is acceptable and applicable based mostly on particular standards.... (extra particulars see the complete pattern)")
                picture(picture)
            },
        )

val jsonResponse = Json.parseToJsonElement(response.textual content)
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true
val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content material

Within the snippet above, we’re leveraging structured output capabilities of the mannequin by defining the schema of the response. We’re passing a Schema object through the responseSchema param within the generationConfig.

We wish to validate that the picture has sufficient data to generate a pleasant Android avatar. So we ask the mannequin to return a json object with success = true/false and an non-compulsory error message explaining why the picture does not have sufficient data.

Structured output is a strong function enabling a smoother integration of LLMs to your app by controlling the format of their output, just like an API response.

Picture captioning with Gemini Flash

As soon as it is established that the picture incorporates ample data to generate an Android avatar, it’s captioned utilizing Gemini 2.5 Flash with structured output.

val jsonSchema = Schema.obj(
            properties = mapOf(
                "success" to Schema.boolean(),
                "user_description" to Schema.string(),
            ),
            optionalProperties = listOf("user_description"),
        )
val generativeModel = createGenerativeTextModel(jsonSchema)

val immediate = "You might be to create a VERY detailed description of the primary individual within the given picture. This description will likely be translated right into a immediate for a generative picture mannequin..."

val response = generativeModel.generateContent(
content material { 
       	textual content(immediate) 
             	picture(picture) 
	})
        
val jsonResponse = Json.parseToJsonElement(response.textual content!!) 
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true

val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content material

The opposite possibility within the app is to begin with a textual content immediate. You’ll be able to enter in particulars about your equipment, coiffure, and clothes, and let Imagen be a bit extra inventive.

Android technology through Imagen

We’ll use this detailed description of your picture to complement the immediate used for picture technology. We’ll add further particulars round what we wish to generate and embody the bot coloration choice as a part of this too, together with the pores and skin tone chosen by the consumer.

val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic fashion, the pose is relaxed and easy, going through immediately ahead [...] The bot appears to be like as follows $userDescription [...]"

We then name the Imagen mannequin to create the bot. Utilizing this new immediate, we create a mannequin and name generateImages:

// we provide our personal fine-tuned mannequin right here however you should utilize "imagen-3.0-generate-002" 
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel(
            "imagen-3.0-generate-002",
            safetySettings =
            ImagenSafetySettings(
                ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE,
                personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL,
            ),
)

val response = generativeModel.generateImages(imagenPrompt)

val picture = response.pictures.first().asBitmap()

And that’s it! The Imagen mannequin generates a bitmap that we will show on the consumer’s display screen.

Finetuning the Imagen mannequin

The Imagen 3 mannequin was finetuned utilizing Low-Rank Adaptation (LoRA). LoRA is a fine-tuning method designed to cut back the computational burden of coaching massive fashions. As a substitute of updating your complete mannequin, LoRA provides smaller, trainable “adapters” that make small adjustments to the mannequin’s efficiency. We ran a advantageous tuning pipeline on the Imagen 3 mannequin typically obtainable with Android bot property of various coloration combos and totally different property for enhanced cuteness and enjoyable. We generated textual content captions for the coaching pictures and the image-text pairs have been used to finetune the mannequin successfully.

The present pattern app makes use of a typical Imagen mannequin, so the outcomes might look a bit totally different from the visuals on this publish. Nonetheless, the app utilizing the fine-tuned mannequin and a customized model of Firebase AI Logic SDK was demoed at Google I/O. This app will likely be launched later this 12 months and we’re additionally planning on including assist for fine-tuned fashions to Firebase AI Logic SDK later within the 12 months.

moving image of Androidify app demo turning a selfie image of a bearded man wearing a black tshirt and sunglasses, with a blue back pack into a green 3D bearded droid wearing a black tshirt and sunglasses with a blue backpack

The unique picture… and Androidifi-ed picture

ML Package

The app additionally makes use of the ML Package Pose Detection SDK to detect an individual within the digital camera view, which triggers the seize button and provides visible indicators.

To do that, we add the SDK to the app, and use PoseDetection.getClient(). Then, utilizing the poseDetector, we have a look at the detectedLandmarks which are within the streaming picture coming from the Digital camera, and we set the _uiState.detectedPose to true if a nostril and shoulders are seen:

non-public droop enjoyable runPoseDetection() {
    PoseDetection.getClient(
        PoseDetectorOptions.Builder()
            .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
            .construct(),
    ).use { poseDetector ->
        // Since picture evaluation is processed by ML Package asynchronously in its personal thread pool,
        // we will run this immediately from the calling coroutine scope as an alternative of pushing this
        // work to a background dispatcher.
        cameraImageAnalysisUseCase.analyze { imageProxy ->
            imageProxy.picture?.let { picture ->
                val poseDetected = poseDetector.detectPersonInFrame(picture, imageProxy.imageInfo)
                _uiState.replace { it.copy(detectedPose = poseDetected) }
            }
        }
    }
}

non-public droop enjoyable PoseDetector.detectPersonInFrame(
    picture: Picture,
    imageInfo: ImageInfo,
): Boolean {
    val outcomes = course of(InputImage.fromMediaImage(picture, imageInfo.rotationDegrees)).await()
    val landmarkResults = outcomes.allPoseLandmarks
    val detectedLandmarks = mutableListOf()
    for (landmark in landmarkResults) {
        if (landmark.inFrameLikelihood > 0.7) {
            detectedLandmarks.add(landmark.landmarkType)
        }
    }

    return detectedLandmarks.containsAll(
        listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER),
    )
}

The digital camera shutter button is activated when an individual (or a bot!) enters the body.

Get began with AI on Android

The Androidify app makes an intensive use of the Gemini 2.5 Flash to validate the picture and generate an in depth description used to generate the picture. It additionally leverages the particularly fine-tuned Imagen 3 mannequin to generate pictures of Android bots. Gemini and Imagen fashions are simply built-in into the app through the Firebase AI Logic SDK. As well as, ML Package Pose Detection SDK controls the seize button, enabling it solely when an individual is current in entrance of the digital camera.

To get began with AI on Android, go to the Gemini and Imagen documentation for Android.

Discover this announcement and all Google I/O 2025 updates on io.google beginning Might 22.

Previous articleGoogle Adverts Lowers Buyer Lists Requirement For Goal Community On Search Campaigns

Next articleGet one yr of Premium for less than $25

How Androidify leverages Gemini, Firebase and ML Package

App movement

Gemini and picture validation

Picture captioning with Gemini Flash

Android technology through Imagen

Finetuning the Imagen mannequin

ML Package

Get began with AI on Android

vivo X Fold5 in for assessment

TechCrunch Mobility: Tesla enters its Grok period, and youths come for robotaxis

Amazon Prime Day Sale 2025: Greatest Offers On Smartwatches Beneath Rs. 5,000 in India

LEAVE A REPLY Cancel reply

Most Popular

Future-proofing enterprise capabilities with AI applied sciences

100-V GaN transistors meet automotive customary

Cycrown Verve Ebike Overview – CleanTechnica

Samsung to deploy O-RAN for Vodafone in Europe

Recent Comments

ABOUT US

POPULAR POSTS

Future-proofing enterprise capabilities with AI applied sciences

100-V GaN transistors meet automotive customary

Cycrown Verve Ebike Overview – CleanTechnica

POPULAR CATEGORY