Source URL: https://cloud.google.com/blog/topics/developers-practitioners/learn-how-to-create-an-ai-agent-for-trip-planning-with-gemini-1-5-pro/
Source: Cloud Blog
Title: Build an AI agent for trip planning with Gemini 1.5 Pro: A step-by-step guide
Feedly Summary: Gemini 1.5 Pro is creating new possibilities for developers to build AI agents that streamline the customer experience. In this post, we’ll focus on a practical application that has emerged in the travel industry – building an AI-powered trip planning agent. You’ll learn how to connect your agent to external data sources like event APIs, enabling it to generate personalized travel itineraries based on real-time information.
Understanding the core concepts
Function calling: Allows developers to connect Gemini models (all Gemini models except Gemini 1.0 Pro Vision) with external systems, APIs, and data sources. This enables the AI to retrieve real-time information and perform actions, making it more dynamic and versatile.
Grounding: Enhances Gemini’ model’s ability to access and process information from external sources like documents, knowledge bases, and the web, leading to more accurate and up-to-date responses.
By combining these features, we can create an AI agent that can understand user requests, retrieve relevant information from the web, and provide personalized recommendations.
aside_block
Step-by-step: Function calling with grounding
Let’s run through a scenario:
Let’s say you’re an AI engineer tasked with creating an AI agent that helps users plan trips by finding local events and potential hotels to stay at. Your company has given you full creative freedom to build a minimal viable product using Google’s generative AI products, so you’ve chosen to use Gemini 1.5 Pro and loop in other external APIs.
The first step is to define potential queries that any user might enter into the Gemini chat. This will help clarify development requirements and ensure the final product meets the standards of both users and stakeholders. Here are some examples:
“I’m bored, what is there to do today?”
“I would like to take me and my two kids somewhere warm because spring break starts next week. Where should I take them?”
“My friend will be moving to Atlanta soon for a job. What fun events do they have going on during the weekends?”
From these sample queries, it looks like we’ll need to use an events API and a hotels API for localized information. Next, let’s set up our development environment.
Notebook setup
To use Gemini 1.5 Pro for development, you’ll need to either create or use an existing project in Google Cloud. Follow the official instructions that are linked here before continuing. Working in a Jupyter notebook environment is one of the easiest way to get started developing with Gemini 1.5 Pro. You can either use Google Colab or follow along in your own local environment.
First, you’ll need to install the latest version of the Vertex AI SDK for Python, import the necessary modules, and initialize the Gemini model:
1. Add a code cell to install the necessary libraries. This demo notebook requires the use of the google-cloud-aiplatform>=1.52 Python module.
code_block
<ListValue: [StructValue([(‘code’, ‘!pip3 install –upgrade –user “google-cloud-aiplatform>=1.52"\r\n!pip3 install vertexai’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9db80>)])]>
2. Add another code cell to import the necessary Python packages.
code_block
<ListValue: [StructValue([(‘code’, ‘import vertexai\r\nfrom vertexai.preview.generative_models import GenerativeModel, FunctionDeclaration, Tool, HarmCategory, HarmBlockThreshold, Content, Part\r\n\r\nimport requests\r\nimport os\r\nfrom datetime import date’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9de50>)])]>
3. Now we can initialize Vertex AI with your exact project ID. Enter your information in between the variable quotes so you can reuse them. Uncomment the gcloud authentication commands if necessary.
code_block
<ListValue: [StructValue([(‘code’, ‘PROJECT_ID = "" #@param {type:"string"}\r\nLOCATION = "" #@param {type:"string"}\r\n\r\n# !gcloud auth login \r\n# !gcloud config set project $PROJECT_ID\r\n# !gcloud auth application-default login\r\n\r\nvertexai.init(project=PROJECT_ID, location=LOCATION)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d700>)])]>
API key configuration
For this demo, we will also be using an additional API to generate information for the events and hotels. We’ll be using Google’s SerpAPI for both, so be sure to create an account and select a subscription plan that fits your needs. This demo can be completed using their free tier. Once that’s done, you’ll find your unique API key in your account dashboard.
Once you have the API keys, you can pass them to the SDK in one of two ways:
Put the key in the GOOGLE_API_KEY environment variable (where the SDK will automatically pick it up from there)
Pass the key using genai.configure(api_key = . . .)
Navigate to https://serpapi.com and replace the contents of the variable below between the quotes with your specific API key:
code_block
<ListValue: [StructValue([(‘code’, ‘SERP_API_KEY = os.environ.get("SERP API", "your_serp_api_key")’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d340>)])]>
Defining custom functions for function calling
In this step, you’ll define custom functions in order to pass them to Gemini 1.5 Pro and incorporate the API outputs back into the model for more accurate responses. We’ll first define a function for the events API.
To use function calling, pass a list of functions to the tools parameter when creating a generative model. The model uses the function name, docstring, parameters, and parameter type annotations to decide if it needs the function to best answer a prompt.
code_block
<ListValue: [StructValue([(‘code’, ‘def event_api(query: str, htichips: str = "date:today"):\r\n URL = f"https://serpapi.com/search.json?api_key={SERP_API_KEY}&engine=google_events&q={query}&htichips={htichips}&hl=en&gl=us"\r\n response = requests.get(URL).json()\r\n return response["events_results"]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9dc10>)])]>
Now we will follow the same format to define a function for the hotels API.
code_block
<ListValue: [StructValue([(‘code’, ‘def hotel_api(query:str, check_in_date:str, check_out_date:int, hotel_class:int = 3, adults:int = 2):\r\n URL = f"https://serpapi.com/search.json?api_key={SERP_API_KEY}&engine=google_hotels&q={query}&check_in_date={check_in_date}&check_out_date={check_out_date}&adults={int(adults)}&hotel_class={int(hotel_class)}¤cy=USD&gl=us&hl=en"\r\n response = requests.get(URL).json()\r\n \r\n return response["properties"]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d520>)])]>
Declare the custom function as a tool
The function declaration below describes the function for the events API. It lets the Gemini model know this API retrieves event information based on a query and optional filters.
code_block
<ListValue: [StructValue([(‘code’, ‘event_function = FunctionDeclaration(\r\n name = "event_api",\r\n description = "Retrieves event information based on a query and optional filters.",\r\n parameters = {\r\n "type":"object",\r\n "properties": {\r\n "query":{\r\n "type":"string",\r\n "description":"The query you want to search for (e.g., \’Events in Austin, TX\’)."\r\n },\r\n "htichips":{\r\n "type":"string",\r\n "description":"""Optional filters used for search. Default: \’date:today\’.\r\n \r\n Options:\r\n – \’date:today\’ – Today\’s events\r\n – \’date:tomorrow\’ – Tomorrow\’s events\r\n – \’date:week\’ – This week\’s events\r\n – \’date:weekend\’ – This weekend\’s events\r\n – \’date:next_week\’ – Next week\’s events\r\n – \’date:month\’ – This month\’s events\r\n – \’date:next_month\’ – Next month\’s events\r\n – \’event_type:Virtual-Event\’ – Online events\r\n """,\r\n }\r\n },\r\n "required": [\r\n "query"\r\n ]\r\n },\r\n)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d640>)])]>
Again, we will follow the same format for the hotels API.
code_block
<ListValue: [StructValue([(‘code’, ‘hotel_function = FunctionDeclaration(\r\n name="hotel_api",\r\n description="Retrieves hotel information based on location, dates, and optional preferences.",\r\n parameters= {\r\n "type":"object",\r\n "properties": {\r\n "query":{\r\n "type":"string",\r\n "description":"Parameter defines the search query. You can use anything that you would use in a regular Google Hotels search."\r\n },\r\n "check_in_date":{\r\n "type":"string",\r\n "description":"Check-in date in YYYY-MM-DD format (e.g., \’2024-04-30\’)."\r\n },\r\n "check_out_date":{\r\n "type":"string",\r\n "description":"Check-out date in YYYY-MM-DD format (e.g., \’2024-05-01\’)."\r\n },\r\n "hotel_class":{\r\n "type":"integer",\r\n "description":"""hotel class.\r\n\r\n\r\n Options:\r\n – 2: 2-star\r\n – 3: 3-star\r\n – 4: 4-star\r\n – 5: 5-star\r\n \r\n For multiple classes, separate with commas (e.g., \’2,3,4\’)."""\r\n },\r\n "adults":{\r\n "type": "integer",\r\n "description": "Number of adults. Only integers, no decimals or floats (e.g., 1 or 2)"\r\n }\r\n },\r\n "required": [\r\n "query",\r\n "check_in_date",\r\n "check_out_date"\r\n ]\r\n },\r\n)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d3a0>)])]>
Consider configuring safety settings for the model
Safety settings in Gemini exist to prevent the generation of harmful or unsafe content. They act as filters that analyze the generated output and block or flag anything that might be considered inappropriate, offensive, or dangerous. This is good practice when you’re developing using generative AI content.
code_block
<ListValue: [StructValue([(‘code’, ‘generation_config = {\r\n "max_output_tokens": 128,\r\n "temperature": .5,\r\n "top_p": .3,\r\n}\r\n\r\nsafety_settings = {\r\n HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,\r\n HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH,\r\n HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,\r\n HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,\r\n}’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d250>)])]>
Pass the tool and start a chat
Here we’ll be passing the tool as a function declaration and starting the chat with Gemini. Using the chat.send_message(“ . . . “) functionality, you can send messages to the model in a conversation-like structure.
code_block
<ListValue: [StructValue([(‘code’, ‘tools = Tool(function_declarations=[event_function, hotel_function])\r\n\r\nmodel = GenerativeModel(\r\n model_name = \’gemini-1.5-pro-001\’, \r\n generation_config = generation_config, \r\n safety_settings = safety_settings, \r\n tools = [tools])\r\nchat = model.start_chat()\r\nresponse = chat.send_message("Hello")\r\nprint(response.text)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9da60>)])]>
Build the agent
Next we will create a callable hashmap to map the tool name to the tool function so that it can be called within the agent function. We will also implement prompt engineering (mission prompt) to better prompt the model to handle user inputs and equip the model with the datetime.
code_block
<ListValue: [StructValue([(‘code’, ‘CallableFunctions = {\r\n "event_api": event_api,\r\n "hotel_api": hotel_api\r\n}\r\n\r\ntoday = date.today()\r\n\r\ndef mission_prompt(prompt:str):\r\n return f"""\r\n Thought: I need to understand the user\’s request and determine if I need to use any tools to assist them.\r\n Action: \r\n \r\n – If the user\’s request needs following APIs from available ones: weather, event, hotel, and I have all the required parameters, call the corresponding API.\r\n – Otherwise, if I need more information to call an API, I will ask the user for it.\r\n – If the user\’s request doesn\’t need an API call or I don\’t have enough information to call one, respond to the user directly using the chat history.\r\n – Respond with the final answer only\r\n\r\n [QUESTION] \r\n {prompt}\r\n\r\n [DATETIME]\r\n {today}\r\n\r\n """.strip()\r\n\r\n\r\n\r\ndef Agent(user_prompt):\r\n prompt = mission_prompt(user_prompt)\r\n response = chat.send_message(prompt)\r\n tools = response.candidates[0].function_calls\r\n while tools:\r\n for tool in tools:\r\n function_res = CallableFunctions[tool.name](**tool.args)\r\n response = chat.send_message(Content(role="function_response",parts=[Part.from_function_response(name=tool.name, response={"result": function_res})]))\r\n tools = response.candidates[0].function_calls\r\n return response.text’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9d940>)])]>
Test the agent
Below are some sample queries you can try to test the chat capabilities of the agent. Don’t forget to test out a query of your own!
code_block
<ListValue: [StructValue([(‘code’, ‘response1 = Agent("Hello")\r\nprint(response1)\r\n\r\nresponse2 = Agent("What events are there to do in Atlanta, Georgia?")\r\nprint(response2)\r\n\r\nresponse3 = Agent("Are there any hotel avaiable in Midtown Atlanta for this weekend?")\r\nprint(response3)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e712ca9dbb0>)])]>
Wrapping up
That’s all! Gemini 1.5 Pro’s function calling and grounding features enhances its capabilities, enabling developers to connect to external tools and improve model results. This integration enables Gemini models to provide up-to-date information while minimizing hallucinations.
If you’re looking for more hands-on tutorials and code examples, check out some of Google’s Codelabs (such as How to Interact with APIs Using Function Calling in Gemini) to guide you through examples of building a beginner function calling application.
AI Summary and Description: Yes
Summary: The text details the functionalities of Gemini 1.5 Pro, particularly focusing on its application in creating AI-powered trip planning agents through function calling and grounding. It provides insights for developers into leveraging the model with external APIs, enhancing the customer experience in domains like travel.
Detailed Description:
The text focuses on the capabilities unlocked by Gemini 1.5 Pro for developers to create AI agents aimed at improving customer service, especially in the travel sector. It particularly emphasizes the importance of function calling and grounding functionalities in enabling real-time responsiveness and personalized recommendations.
Key Points:
– **Gemini 1.5 Pro Overview**: The model offers developers tools to build AI agents, particularly useful in industries like travel for streamlining customer interactions.
– **Function Calling**: This feature allows AI models to interact with external APIs and systems, thereby expanding their utility. It enables dynamic responses based on real-time data.
– **Grounding**: This enhances the model’s ability to fetch relevant information from various external sources, increasing the accuracy and reliability of its responses.
– **Real-World Application**: A scenario is described where an AI engineer builds a trip planning agent, utilizing event and hotel APIs to provide tailored recommendations based on user queries.
– **Development Environment Setup**: The text outlines necessary steps for setting up the development environment, including:
– Installing necessary Python libraries.
– Configuring API keys for accessing external data sources.
– **Safety Settings**: The importance of incorporating safety settings in the model to filter out harmful or inappropriate outputs is mentioned, supporting a responsible AI development approach.
– **Testing and Implementation**: Examples of user queries are provided to highlight how the AI agent can interact with users, showcasing the practical application of function calling in generating relevant information in response to user inquiries.
The overall analysis underlines Gemini 1.5 Pro’s ability to facilitate the creation of responsive, contextually aware AI applications, emphasizing its significance for developers and organizations aiming to enhance customer interactions and satisfaction within specific domains like travel.