Building an Agent for Color It Daily, Part 2

Introduction

In Part #1 of this series on how to build and deploy a complex agent to generate unique coloring pages, we built The Creative Director. In this second part, we are tackling The Stylist. This agent acts as the prompt engineer, responsible for taking the output of our previous agent and creating a detailed, optimized text prompt (we rely on both positive and negative prompts that we will concatenate during generation). The output of this agent will then be used by The Generator to create and optimize the coloring page from the text prompt. Below you will find the expected output of our agent. It aggregates two additional fields to the input (positive_prompt and negative_prompt). We do this since we are relying on a SequentialAgent, which automatically flows the output of one agent to the input of the next.

We need to hold on to the full context since everything must be persisted. Another approach would have been to persist the data early, then update the state at each step. This would require generating a unique ID (referenced by all agents) to update the state. For this project, I decided to persist data only at the end. We will include a critic agent to review the assets and loop back to the generation step if the quality or guidelines are not met. This means that for each final asset, we might loop through multiple attempts before reaching the result. Persisting early would have complicated our logging, which I believe is unnecessary for this project. The downside is that we will lose tracking of the intermediate generations.

json

1{
2  "title": "...",
3  "description": "...",
4  "visual_tags": [...],
5  "mood": "...",
6  "target_audience": "...",
7  "positive_prompt": "Detailed natural language description...",
8  "negative_prompt": "Forbidden elements..."
9}

Update reference architecture

The first thing we will do is update our reference architecture to make sure it still reflects our finalized implementation for The Creative Director agent (especially important because the output of the first agent is the input for this one).

User Prompt

gemini

Please update @REFERENCE_ARCHITECTURE.md for the finalized version of the Creative Director agent @color_it_daily_agent/creative_director/. Make sure that the content in our reference document is still valid and no important details is missing in @REFERENCE_ARCHITECTURE.md.

As you can see in the output below, this is an important step to ensure that the next agent we build is grounded in the updated end-to-end architecture. By updating our reference architecture, we improve the quality of the agents we generate and ensure they fit into our overall design.

Instead of referencing our architecture document, we could reference specific files or let Gemini CLI try to guess what is relevant, but as always, it is better to be explicit. Additionally, even though we are working on a specific agent here, our reference documents provide the context for the full end-to-end project.

Assistant Response

gemini

I have updated @REFERENCE_ARCHITECTURE.md to match the Creative Director agent implementation.
Specifically:

Rotate Composition: Updated to match the specific names in the instructions (Character Sticker, Full Scene, Collection, Action Shot).
De-duplication: Removed the specific '30 days' constraint and generalized it to 'past content' to align with the vector search implementation.
Tools: Clarified that get_calendar_events returns seasons and observances, and that search_past_concepts uses Firestore for vector search.

The documentation is now aligned with the codebase for the Creative Director agent.

Initial system instructions

Now that we have updated and reviewed our instructions, we are ready to generate the system instructions for the second agent, The Stylist.

User Prompt

gemini

Take a look at @REFERENCE_ARCHITECTURE.md. generate The Creative Director system instructions.

For these system instructions, I had to iterate a lot. For example, it started by generating comma-separated prompts, which I know are not optimal for Nano (detailed prompts are better). Therefore, I asked for the following change:

User Prompt

gemini

is nano banana not better with detailed prompt vs comma seperated string? * `positive_prompt`: A comma-separated string of technical keywords. * `negative_prompt`: A comma-separated string of exclusion keywords.

Additionally, I felt there wasn't enough variance for the micro style, so I asked for more to be added.

These are only examples; it's important to read what you get and iterate until you are satisfied. This is the key step of the creative process.

Too many people assume that the first answer you get from an LLM is the right one. Even if valid, it may lack details, miss edge cases, or lack creativity.

Always make sure that your system instructions set precise expectations for the input and output of the agents, then complement them with rules and constraints that will tell your agent how to go from input to output.

Finally, remember that if a detail is not in your instructions, it means that the LLM will assume what you meant. This usually results in average expectations for a similar context. If you are doing something unique or creative, you won't get what you want. Don't settle for the average—provide the details. This is where creativity comes into play.

These are only examples of what might go wrong. This is part of the process; it's similar to working with a team. With your team, you do pull request reviews to improve the quality of the work, you hold brainstorming sessions, you challenge yourself, and you review assumptions. You don't work alone. It's the same when you build with AI: you are the task giver, reviewer, and final stakeholder of the vision you are trying to build. The process changes, but ultimately what you care about is the same: solving a problem with a technological solution. Therefore, ask clarifying questions if you don't understand something. Always go back to what you are trying to build and make sure that the solution being built before your eyes is aligned with your goals. Don't assume it is.

Use few examples

To improve your system instructions and the consistency of the output, you can include examples of inputs and expected outputs. It's similar to the saying that a picture is worth a thousand words. The examples you include are those "pictures" for Gemini. Combined with detailed system instructions, they will reduce the risk of hallucinations greatly. For reference, you can find below the few-shot examples that were included in my system instructions:

markdown

1### EXAMPLES
2
3**Scenario A: The "Bold Sticker" Style**
4*Input triggers:* `target_audience="child"`, `mood="Energetic"`, `tags=["fox", "simple"]`
5
6**Input:**
7
8```json
9{
10  "title": "Winter Fox",
11  "description": "A fox sitting and smiling.",
12  "visual_tags": ["fox", "winter", "simple"],
13  "mood": "Energetic",
14  "target_audience": "child"
15}
16
17```
18
19**Output:**
20
21```json
22{
23  "title": "Winter Fox",
24  "description": "A fox sitting and smiling.",
25  "visual_tags": ["fox", "winter", "simple"],
26  "mood": "Energetic",
27  "target_audience": "child",
28  "positive_prompt": "A die-cut sticker design of a happy fox. The fox is depicted with an energetic expression, sitting upright. The image features ultra-thick, uniform black outer contours that completely isolate the character from the white background. There are no background elements, no snowflakes, and no scenery. The interior lines are simple and bold, designed for easy coloring.",
29  "negative_prompt": "background, scenery, trees, snowflakes, thin lines, complex details, shading, grayscale, texture, sketchy, small parts"
30}
31```

Final system instructions

You will find below the final system instructions for our The Stylist agent.

ostamand / color-it-daily-agent/instructions.py

Styling agent system instructions

Python

Since our agent will be part of a SequentialAgent, I recommend you follow these guidelines:

1Define the input and output payload; be consistent in the way you define types. I typically use TypeScript-looking annotations.
2Define the steps for the agent on how to get from the input to the output.
3If you have constraints or rules the agent needs to follow, make sure to include them in the appropriate step.
4When possible, provide few-shot examples detailing input and the associated output.

Run locally

Now that our second agent is complete, we can compose our agents in a sequential flow (SequentialAgent) and test.

agent.py

python

1import json
2import asyncio
3
4from google.adk.agents import LlmAgent, SequentialAgent
5from google.adk.models import Gemini
6from google.adk.runners import InMemoryRunner
7
8from .instructions import INSTRUCTIONS_V1
9from ..app_configs import configs
10
11stylist = LlmAgent(
12    name="Stylist",
13    instruction=INSTRUCTIONS_V1,
14    model=Gemini(model=configs.llm_model),  
15)
16
17async def main():
18    from datetime import datetime
19    now = datetime.now()
20    current_date_str = now.strftime("%Y-%m-%d")
21
22    from ..creative_director.agent import creative_director
23
24    sequential_agent = SequentialAgent(name="SequentialAgent", sub_agents=[
25        creative_director, stylist
26    ])
27
28    runner = InMemoryRunner(agent=sequential_agent)
29
30    user_request = {
31        "current_date": current_date_str,
32    }
33
34    await runner.run_debug(
35        json.dumps(user_request),
36        verbose=True,
37    )

When running this, I received the following output from the first The Creative Director agent (the one we built in Part #1):

Terminal

$ "title": "The Skating Penguin",

$ "description": "A happy penguin wearing a striped scarf and ice skates, gliding gracefully on a smooth frozen pond. In the background, there are a few snow-covered pine trees and a small snowy hill.",

$ "visual_tags": ["penguin", "winter", "scenery", "nature", "ice skating"],

$ "target_audience": "child",

$ "complexity": "low",

$ "mood": "Calm",

$ "avoid_elements": ["sharp ice", "thin lines", "complex textures", "cracked ice"]

Then, this JSON output was forwarded to our next agent, The Stylist. I received the following:

Terminal

$ "title": "The Skating Penguin",

$ "visual_tags": [

$ "penguin",

$ "winter",

$ "scenery",

$ "nature",

$ "ice skating"

$ ],

$ "mood": "Calm",

$ "target_audience": "child",

$ "positive_prompt": "A pristine, black-and-white coloring page designed for children. A happy penguin is gliding gracefully across the surface of a smooth, frozen pond. The penguin is wearing a simple striped scarf and small ice skates. The background is a peaceful winter scene featuring a few rounded, snow-covered pine trees and a small, gentle snowy hill. The line work is fluid, friendly, and organic, using thick, uniform black lines on a pure white background. The composition is uncluttered with large, closed shapes and absolutely no shading, textures, or grayscale fills.",

$ "negative_prompt": [

$ "sharp ice",

$ "cracked ice",

$ "thin lines",

$ "complex textures",

$ "shading",

$ "grayscale",

$ "gradients",

$ "photographic realism",

$ "messy sketches",

$ "small intricate details",

$ "dark backgrounds",

$ "cross-hatching"

$ ]

If you don't get what you want, you can iterate on the system instructions of the agent. Or you may even need to go back and update the instructions of the first agent to make sure they work well together. You are building a team; it needs to work efficiently, combining different skills into the output you are expecting. If you do any change, don't forget to go back to your reference architecture to keep it in sync with your updates.

Our The Stylist agent is now completed. We have now a rich input containing a description, tags, a positive prompt, etc. that will be the foundation of all of our next steps.

In the next part of this series, we will create the next agent. The Generator which will use the Gemini API to generate an image that will be saved to a cloud storage. It will then call an additional tool to upscale this image so that it is optimal for high-quality printing.

Building an Agent for Color It Daily, Part 2

Introduction

Update reference architecture

Initial system instructions

Use few examples

Final system instructions

Run locally

Share this post

Comments