Amazon Web Services (AWS), the cloud computing arm of Amazon, has introduced a new tool to address AI "hallucinations"—instances where AI models provide unreliable or incorrect responses.
Unveiled at the AWS re:Invent 2024 conference in Las Vegas, the feature, called Automated Reasoning Checks, cross-references customer-supplied information to validate an AI model’s outputs. AWS describes it as the first safeguard of its kind against hallucinations, though similar tools already exist. Microsoft, for example, launched a comparable Correction feature earlier this year, and Google’s Vertex AI platform also offers grounding mechanisms that rely on external data sources, custom datasets, or Google Search.
Automated Reasoning Checks is part of AWS’ Bedrock service, specifically within the Guardrails toolset. It analyzes how a model generates responses and evaluates their accuracy. By referencing user-provided data, it creates a set of rules that can be applied to the model. When a potential hallucination is detected, the tool draws on this "ground truth" to generate a correct response, presenting both the likely incorrect output and the accurate information for comparison.
AWS notes that companies like PwC are already leveraging this feature to create AI assistants. Swami Sivasubramanian, AWS’ VP of AI and data, emphasized that such innovations are helping to address critical challenges in deploying generative AI. He also highlighted that Bedrock’s customer base has grown 4.7x over the past year.
However, addressing hallucinations in AI is a challenging endeavor. Experts note that generative AI models do not "know" anything but predict patterns based on data they’ve been trained on, meaning their responses are probabilistic guesses rather than definitive answers. While AWS asserts that Automated Reasoning Checks uses “logically accurate” and “verifiable reasoning,” no performance data has been provided to substantiate its reliability.
In related announcements, AWS introduced Model Distillation, a feature that transfers the capabilities of large AI models (e.g., Llama 405B) to smaller, more cost-effective models (e.g., Llama 8B). This process allows users to experiment with models without incurring significant costs. Customers supply sample prompts, and Bedrock handles fine-tuning and generating additional training data if necessary.
However, Model Distillation has limitations: it only supports Bedrock-hosted models from Anthropic and Meta, requires the large and small models to be from the same family, and leads to a slight loss of accuracy (under 2%, according to AWS). The tool is currently available in preview.
AWS also introduced multi-agent collaboration as part of Bedrock Agents. This feature enables AI agents to handle subtasks within a larger project, with a "supervisor agent" coordinating efforts. The supervisor can distribute tasks, grant access to necessary information, and synthesize outputs from specialized AI agents.
Both Automated Reasoning Checks and Model Distillation are now available in preview, expanding AWS’ suite of tools for improving and deploying AI solutions.
Post a Comment