Building an Agent for Color It Daily, Part 2
Introduction
In Part #1 of this series on how to build and deploy a complex agent to generate unique coloring pages, we built The Creative Director. In this second part, we are tackling The Stylist. This agent acts as the prompt engineer, responsible for taking the output of our previous agent and creating a detailed, optimized text prompt (we rely on both positive and negative prompts that we will concatenate during generation). The output of this agent will then be used by The Generator to create and optimize the coloring page from the text prompt. Below you will find the expected output of our agent. It aggregates two additional fields to the input (positive_prompt and negative_prompt). We do this since we are relying on a SequentialAgent, which automatically flows the output of one agent to the input of the next.
We need to hold on to the full context since everything must be persisted. Another approach would have been to persist the data early, then update the state at each step. This would require generating a unique ID (referenced by all agents) to update the state. For this project, I decided to persist data only at the end. We will include a critic agent to review the assets and loop back to the generation step if the quality or guidelines are not met. This means that for each final asset, we might loop through multiple attempts before reaching the result. Persisting early would have complicated our logging, which I believe is unnecessary for this project. The downside is that we will lose tracking of the intermediate generations.
1{2 "title": "...",3 "description": "...",4 "visual_tags": [...],5 "mood": "...",6 "target_audience": "...",7 "positive_prompt": "Detailed natural language description...",8 "negative_prompt": "Forbidden elements..."9}
Update reference architecture
The first thing we will do is update our reference architecture to make sure it still reflects our finalized implementation for The Creative Director agent (especially important because the output of the first agent is the input for this one).
As you can see in the output below, this is an important step to ensure that the next agent we build is grounded in the updated end-to-end architecture. By updating our reference architecture, we improve the quality of the agents we generate and ensure they fit into our overall design.
Instead of referencing our architecture document, we could reference specific files or let Gemini CLI try to guess what is relevant, but as always, it is better to be explicit. Additionally, even though we are working on a specific agent here, our reference documents provide the context for the full end-to-end project.
Specifically:
- Rotate Composition: Updated to match the specific names in the instructions (Character Sticker, Full Scene, Collection, Action Shot).
- De-duplication: Removed the specific '30 days' constraint and generalized it to 'past content' to align with the vector search implementation.
- Tools: Clarified that get_calendar_events returns seasons and observances, and that search_past_concepts uses Firestore for vector search.
Initial system instructions
Now that we have updated and reviewed our instructions, we are ready to generate the system instructions for the second agent, The Stylist.
For these system instructions, I had to iterate a lot. For example, it started by generating comma-separated prompts, which I know are not optimal for Nano (detailed prompts are better). Therefore, I asked for the following change:
Additionally, I felt there wasn't enough variance for the micro style, so I asked for more to be added.
These are only examples; it's important to read what you get and iterate until you are satisfied. This is the key step of the creative process.
Too many people assume that the first answer you get from an LLM is the right one. Even if valid, it may lack details, miss edge cases, or lack creativity.
Always make sure that your system instructions set precise expectations for the input and output of the agents, then complement them with rules and constraints that will tell your agent how to go from input to output.
Finally, remember that if a detail is not in your instructions, it means that the LLM will assume what you meant. This usually results in average expectations for a similar context. If you are doing something unique or creative, you won't get what you want. Don't settle for the average—provide the details. This is where creativity comes into play.
These are only examples of what might go wrong. This is part of the process; it's similar to working with a team. With your team, you do pull request reviews to improve the quality of the work, you hold brainstorming sessions, you challenge yourself, and you review assumptions. You don't work alone. It's the same when you build with AI: you are the task giver, reviewer, and final stakeholder of the vision you are trying to build. The process changes, but ultimately what you care about is the same: solving a problem with a technological solution. Therefore, ask clarifying questions if you don't understand something. Always go back to what you are trying to build and make sure that the solution being built before your eyes is aligned with your goals. Don't assume it is.
Use few examples
To improve your system instructions and the consistency of the output, you can include examples of inputs and expected outputs. It's similar to the saying that a picture is worth a thousand words. The examples you include are those "pictures" for Gemini. Combined with detailed system instructions, they will reduce the risk of hallucinations greatly. For reference, you can find below the few-shot examples that were included in my system instructions:
1### EXAMPLES23**Scenario A: The "Bold Sticker" Style**4*Input triggers:* `target_audience="child"`, `mood="Energetic"`, `tags=["fox", "simple"]`56**Input:**78```json9{10 "title": "Winter Fox",11 "description": "A fox sitting and smiling.",12 "visual_tags": ["fox", "winter", "simple"],13 "mood": "Energetic",14 "target_audience": "child"15}1617```1819**Output:**2021```json22{23 "title": "Winter Fox",24 "description": "A fox sitting and smiling.",25 "visual_tags": ["fox", "winter", "simple"],26 "mood": "Energetic",27 "target_audience": "child",28 "positive_prompt": "A die-cut sticker design of a happy fox. The fox is depicted with an energetic expression, sitting upright. The image features ultra-thick, uniform black outer contours that completely isolate the character from the white background. There are no background elements, no snowflakes, and no scenery. The interior lines are simple and bold, designed for easy coloring.",29 "negative_prompt": "background, scenery, trees, snowflakes, thin lines, complex details, shading, grayscale, texture, sketchy, small parts"30}31```
Final system instructions
You will find below the final system instructions for our The Stylist agent.
Since our agent will be part of a SequentialAgent, I recommend you follow these guidelines:
- 1Define the input and output payload; be consistent in the way you define types. I typically use TypeScript-looking annotations.
- 2Define the steps for the agent on how to get from the input to the output.
- 3If you have constraints or rules the agent needs to follow, make sure to include them in the appropriate step.
- 4When possible, provide few-shot examples detailing input and the associated output.
Run locally
Now that our second agent is complete, we can compose our agents in a sequential flow (SequentialAgent) and test.
1import json2import asyncio34from google.adk.agents import LlmAgent, SequentialAgent5from google.adk.models import Gemini6from google.adk.runners import InMemoryRunner78from .instructions import INSTRUCTIONS_V19from ..app_configs import configs1011stylist = LlmAgent(12 name="Stylist",13 instruction=INSTRUCTIONS_V1,14 model=Gemini(model=configs.llm_model),15)1617async def main():18 from datetime import datetime19 now = datetime.now()20 current_date_str = now.strftime("%Y-%m-%d")2122 from ..creative_director.agent import creative_director2324 sequential_agent = SequentialAgent(name="SequentialAgent", sub_agents=[25 creative_director, stylist26 ])2728 runner = InMemoryRunner(agent=sequential_agent)2930 user_request = {31 "current_date": current_date_str,32 }3334 await runner.run_debug(35 json.dumps(user_request),36 verbose=True,37 )
When running this, I received the following output from the first The Creative Director agent (the one we built in Part #1):
Then, this JSON output was forwarded to our next agent, The Stylist. I received the following:
If you don't get what you want, you can iterate on the system instructions of the agent. Or you may even need to go back and update the instructions of the first agent to make sure they work well together. You are building a team; it needs to work efficiently, combining different skills into the output you are expecting. If you do any change, don't forget to go back to your reference architecture to keep it in sync with your updates.
Our The Stylist agent is now completed. We have now a rich input containing a description, tags, a positive prompt, etc. that will be the foundation of all of our next steps.
In the next part of this series, we will create the next agent. The Generator which will use the Gemini API to generate an image that will be saved to a cloud storage. It will then call an additional tool to upscale this image so that it is optimal for high-quality printing.