Simon Willison’s Weblog: First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

Dec 4, 2024

—

Source URL: https://simonwillison.net/2024/Dec/4/amazon-nova/
Source: Simon Willison’s Weblog
Title: First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

Feedly Summary: Amazon released three new Large Language Models yesterday at their AWS re:Invent conference. The new model family is called Amazon Nova and comes in three sizes: Micro, Lite and Pro.
I built a new LLM plugin called llm-bedrock for accessing the models in the terminal via boto3 and the Amazon Bedrock API.
My initial impressions from trying out the models are that they’re mainly competitive with the Google Gemini family. They are extremely inexpensive – Nova Micro slightly undercuts even previously cheapest model Gemini 1.5 Flash-8B – can handle quite large context and the two larger models can handle images, video and PDFs.

Pricing and capabilities
Using Nova from the terminal with LLM
Multi-modal prompting with images and documents
Closing thoughts: GOAAM

Pricing and capabilities
Amazon list their pricing in price per 1,000 input tokens. Almost everyone else uses price per million, so I’ve done the conversion.
Here’s a table comparing the least expensive models from the largest providers:

Provider
Model
Cents per million input
Cents per million output

OpenAI
GPT-4o Mini
15
60

Anthropic
Claude 3 Haiku
25
125

Anthropic
Claude 3.5 Haiku
100
500

Google
Gemini 1.5 Flash-8B
3.75
15

Google
Gemini 1.5 Flash
7.5
30

Amazon
Nova Micro
3.5
14

Amazon
Nova Lite
6
24

As you can see, Nova Micro is .25 of a cent cheaper on input and 1 cent cheaper on output than previous champion Gemini 1.5 Flash-8B.
And the more expensive models:

Provider
Model
Cents per million input
Cents per million output

OpenAI
GPT-4o
250
1000

OpenAI
o1-mini
300
1200

OpenAI
o1-preview
1500
6000

Anthropic
Claude 3.5 Sonnet
300
1500

Anthropic
Claude 3 Opus
1500
7500

Google
Gemini 1.5 Pro
125
500

Amazon
Nova Pro
80
320

So Nova Pro isn’t priced the same as those top-tier models. The best model in the Nova family is still to come – from their post:

Amazon Nova Premier – Our most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. We’re targeting availability in early 2025.

And from the press release, which presumably is talking about Nova Premier as well:
In early 2025, Amazon will support context length of over 2M input tokens.
Comparing models based on token pricing doesn’t capture the entire picture because models use different tokenizers and hence may use up different numbers of tokens for the same input, especially with images and video added to the mix. I’d like to implement a good way to compare those counts in the future.
In terms of capabilities: all of the above models can handle image inputs now, with the exception of Amazon Nova Micro and o1-mini and o1-preview which are text only. Video support is currently unique to the Gemini and Nova models – Anthropic and OpenAI don’t have any that handle video yet. See Amazon Nova video understanding limitations for more details.
Nova models can’t handle audio, and in fact ignore audio in video that they analyze (treating it as a sequence of images). Gemini models have audio input, as does OpenAI’s gpt-4o-audio-preview model ($100/$200 for input/output of those audio tokens).
Nova mini has a 128,000 input token limit, while Lite and Pro are both 300,000 tokens. This isn’t quite in the same league as Gemini (2 million for Pro and Flash, 1 million for Flash-8B) but beats the standard models from both OpenAI and Anthropic. Anthropic have an “Enterprise" tier with 500,000 but I’ve not had a chance to try that myself.
Using Nova from the terminal with LLM
Install LLM, then install the plugin like this:
llm install llm-bedrock
Next comes the hard part: you’ll need AWS credentials that have permission to access Bedrock models on your behalf.
If you’ve previously configured the AWS CLI you may be able to use a shortcut: request access to the Bedrock models (Nova models are approved instantly) and the tool should be able to pick up your existing credentials.
I got stuck on this for a while, then Nils Durner came to the rescue with step-by-step instructions (17 steps in total, screenshots included) for requesting access to the models, creating an IAM user, creating a JSON IAM policy that allows access to the bedrock: actions, assigning that policy to the new user and then creating and storing an access key and access secret for calling the API.
Now that you have those credentials, you can provide them to LLM by running:
llm keys set bedrock
And then pasting in the access_key:secret_access_key – the two keys with a colon in the middle.
Run llm models to confirm that the models are installed correctly – you should see output that ends like this:
Bedrock: us.amazon.nova-micro-v1:0 (aliases: nova-micro)
Bedrock: us.amazon.nova-lite-v1:0 (aliases: nova-lite)
Bedrock: us.amazon.nova-pro-v1:0 (aliases: nova-pro)

And now you can run prompts! A simple text prompt looks like this – I’m setting an optional system prompt too:
llm -m nova-micro ‘Three ways to impress a pelican’ -s ‘short sentences’

Offer a fish.
Dance like a clown.
Wear bright colors.

That’s using nova-micro, the cheapest and fastest model. Trying that again with -m nova-pro (the best model) gave me:

Offer fresh fish.
Perform mimicry of pelican calls.
Create a calm, respectful environment.

Multi-modal prompting with images and documents
My plugin supports LLM attachments, so you can attach images, video and PDF files to a prompt using the -a option which accepts either a URL or a file path.
Trying against my standard low-resolution pelican group photo:

llm -m nova-lite ‘describe image’ \
-a https://static.simonwillison.net/static/2024/pelicans.jpg -u
The -u option there outputs token usage. I got back:

The image depicts a large gathering of birds on a rocky cliff adjacent to a body of water, likely a sea or a large lake. The birds, primarily pelicans, appear to be resting and socializing in a tightly packed group. Some of the birds are standing while others are perched on the rocks, and a few are near the edge of the cliff, possibly preparing to dive into the water below. The cliff is rocky and has a rugged appearance, with some areas showing signs of erosion or weathering. The water beneath the birds is calm and appears to be a deep blue color, reflecting the clear sky above. The birds’ feathers are visible in various shades of brown, gray, and white, adding to the natural and picturesque scene.
Token usage: 1,668 input, 148 output

Using my LLM pricing calculator (now upgraded to support Nova) I get 0.0136 cents – that’s just over 1/100th of a cent!
For the same prompt using -m nova-pro for the more expensive Nova Pro model:

The image depicts a large gathering of birds on a rocky shore or beach, adjacent to a body of water. The scene is bustling with a variety of birds, primarily pelicans, which are identifiable by their large size, long necks, and distinctive beaks. The pelicans are positioned both standing and sitting on the rocks, with some standing tall and others hunched over.
In addition to pelicans, there are numerous smaller birds, likely seagulls, characterized by their gray and white plumage. These birds are densely packed together, creating a vivid scene of avian life. The seagulls are scattered throughout the rocky area, some standing, others sitting, and a few engaged in various activities such as preening their feathers or searching for food.
The rocky shore is uneven and covered with a mix of pebbles and larger stones, providing a natural habitat for the birds. The body of water in the background appears calm, with a reflection of the sky and the rocky shore. The water’s surface is smooth, indicating little to no wind.
The sky above is clear, suggesting fair weather conditions. The lighting in the image suggests it is daytime, likely in the early morning or late afternoon, given the soft light and the absence of harsh shadows.
In summary, the image captures a lively scene of wildlife, with a focus on pelicans and seagulls congregating on a rocky shore next to a calm body of water under a clear sky. The diverse bird population and their interactions with the environment create a dynamic and engaging visual experience.
Token usage: 1,668 input, 305 output

That came in at a whopping 0.231 cents – nearly a quarter of a cent!
Let’s try something absurdly ambitious. I ran some random Google searches for government military PDFs and stumbled across this one. Let’s convert that to Markdown:
llm -u -m nova-pro ‘convert this document to markdown’ \
-a https://www.darpa.mil/attachments/DARPA_FY18_Presidents_Budget_Request.pdf
You can see the result here. It stopped around page 9 and missed some of the tables, but this was an absurdly challenging document so I’m not surprised it didn’t manage a perfect conversion. It cost me 176,386 input, 4,283 output which is a full 15.4814 cents.
Finally, let’s see if it can code:
llm -m nova-pro ‘python function to domnload a CSV file from a URL, create a corresponding table in SQLite and import the data’
24 input, 616 output = 0.0087 cents.
Full output here, but this is the code it wrote:
import requests
import csv
import sqlite3
import os
from io import StringIO

def download_csv_and_import_to_sqlite(url, db_name, table_name):
# Download the CSV file from the URL
response = requests.get(url)
response.raise_for_status() # Ensure we notice bad responses…

# Read the CSV data
csv_data = response.content.decode(‘utf-8’)
csv_reader = csv.reader(StringIO(csv_data))

# Get the header (column names) from the CSV
header = next(csv_reader)

# Create the SQLite database and connect to it
conn = sqlite3.connect(db_name)
cursor = conn.cursor()

# Create the table
create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ("
create_table_query += ",".join([f"{col} TEXT" for col in header])
create_table_query += ")"
cursor.execute(create_table_query)

# Insert the data into the table
insert_query = f"INSERT INTO {table_name} VALUES ({‘,’.join([‘?’ for _ in header])})"
for row in csv_reader:
cursor.execute(insert_query, row)

# Commit the changes and close the connection
conn.commit()
conn.close()

print(f"Data successfully imported into {db_name} in table {table_name}.")

# Example usage
url = "https://datasette.io/content/stats.csv"
db_name = "example.db"
table_name = "data_table"

download_csv_and_import_to_sqlite(url, db_name, table_name)
That’s spot on. I may have to retire that benchmark question, it’s not enough of a challenge for modern models.
Can they produce an SVG of a pelican on a bicycle? Not very well.
Closing thoughts: GOAAM
My experiments here aren’t particularly comprehensive – I look forward from hearing from people who have better challenges and a more disciplined way of applying them. As a general initial vibe check though these Nova models pass with flying colors.

Google Gemini now has competition in terms of pricing. This is a relief to me, Gemini is so cheap I’ve been nervous that they’ll jack the prices up through lack of competition!
These appear to be good multi-modal models – their image handling feels solid and it’s neat to have a new video-input model (even if it’s quite limited compared to Gemini)
Anthropic’s decision to raise the price for Claude 3.5 Haiku isn’t looking great, given the continued competition at the bottom end of the market. Just a few months ago Claude 3 Haiku was the cheapest model available, now it’s hardly worth considering, undercut by even GPT-4o mini.
Gaining API access to anything on AWS remains horrifyingly difficult – seriously, 17 steps! Some day a PM at AWS is going to figure out that fixing this will have a material impact on Amazon’s bottom line.

From the Nova press release in the what’s next section:

Amazon will introduce an Amazon Nova speech-to-speech model in the first quarter of 2025. […] Amazon is also developing a novel model that can take text, images, audio, and video as input, and generate outputs in any of these modalities. This Amazon Nova model with native multimodal-to-multimodal – or “any-to-any” modality capabilities – will be introduced mid-year 2025.

This reminded me that OpenAI’s GPT-4o was meant to be able to provide image output, which they demonstrated at their launch and as far as I’m aware have not mentioned since. Maybe this will encourage them to finally ship that feature!

The cost of multi-modal models – first Gemini, now Nova – continues to leave me almost speechless. At 0.0136 cents per image I could generate a description of every photo in my 67,771 personal photo library using Nova Mini for $9.21.

With this release I think Amazon may have earned a spot among the top tier of model providers. Maybe we need a new FAANG acronym that covers OpenAI, Anthropic, Google, Meta and Amazon. I like GOAAM.
Tags: amazon, projects, releases, ai, openai, generative-ai, llms, llm, anthropic, gemini, vision-llms, llm-pricing

AI Summary and Description: Yes

Summary: The text discusses Amazon’s launch of three new Large Language Models (LLMs), called Amazon Nova, at the AWS re:Invent conference. It highlights their competitive pricing and capabilities, particularly in handling multi-modal inputs such as images and videos. This information is essential for professionals in the AI and cloud domains, especially those focused on security, compliance, and innovative uses of LLMs.

Detailed Description: The newly announced Amazon Nova family of LLMs is positioned as a significant competitor to existing models from industry leaders like Google and OpenAI. Below are the critical insights from the text:

– **Model Lineup**:
– The Nova family includes three models: Micro, Lite, and Pro, each designed to cater to different use cases and pricing categories.

– **Competitive Pricing**:
– Nova Micro is priced under the previous low-cost benchmark set by Google’s Gemini 1.5 Flash-8B.
– The provided pricing table shows Amazon’s aggressive positioning against competitors like OpenAI and Anthropic, especially in input and output token pricing.

– **Multi-modal Capabilities**:
– All models, barring the Micro version, can handle diverse input types—including images, videos, and PDFs—signifying a shift towards more versatile AI applications.
– Video support is explicitly highlighted as currently unique to the Nova and Gemini models among major providers.

– **Access and Integration**:
– The text outlines the detailed process of integrating and accessing the Nova models, including AWS credential management and specific command-line interactions using the new LLM plugin.

– **Performance and Usage**:
– Early user feedback shows that while Nova models can produce impressive results, they also require careful handling, particularly when processing various data types.

– **Future Developments**:
– Amazon plans to release a more capable multi-modal model, Nova Premier, targeting early 2025, which is expected to offer even better performance and contextual capabilities.

– **Implications for Cloud and AI Professionals**:
– The release of the Nova models introduces new considerations for security and compliance as organizations may adopt these advanced tools for various applications.
– The competitive pricing could also lead to shifts in market positioning, forcing a reevaluation of existing partnerships and technology stacks.

In summary, this text not only informs about a significant advancement in LLM offerings but also provides critical insights that can guide strategic decisions for professionals managing AI and cloud security. The ongoing development and competitive landscape will likely influence compliance standards and governance models in the AI domain moving forward.

-4o .NET 1 2 2024 4 5 Haiku 5 Pro a AAM access Act advancement AGI AI AI applications Amazon Amazon BedRock Anthropic API Application applications Arch art as audio availability AWS Bedrock benchmark by C capabilities challenges Claude Claude 3.5 Claude 3.5 Sonnet CleaR Cloud cloud security code command Competition competitive landscape competitors complex reasoning compliance compliance standards conference Context context length cost credential management credentials critical cross Cursor D DARPA data database dataset datasette day decision demo design development e edge EDR end enterprise environment ERP EU evaluation exp face fast feedback first for future developments g Gemini Gemini 1.5 Gemini 1.5 Pro Gemini model Gemini models Gen generative Go Google Google Gemini governance government GPT GPT-4o Group gs Haiku heap high Highlight http HTTPS IAM image implications in industry Influence information insights integration inter interaction IRS ite jack json Just k keys l language language model language models large large language model large language models led library life limitations Lite llm llm-pricing llms lm long low management markdown market market position market positioning Meta military mission modal modalities model model family model providers models multi multi-modal input Multimodal multimodal model my native no NPU o o1 o1-preview oE of on one open openai organization organizations Outputs over partnership partnerships pdf performance plugin policy post pre Preview pricing professionals projects prompt prompts Py Python question Ray RCE reasoning reasoning tasks response s s Position search sec security security and compliance self side Sig Sim Simple SoC source speech-to-speech sql sqlite SSE stack standards SVG system system prompt T Tails Task tasks technology technology stack terminal text the to token token usage tokens tools Tor training two up upgrade usage use cases user user feedback Valuation Vision vision-llms web Well Wi Wind x