Task
User Story - Exploring Documentation
As a developer, I explore documentation for testing retrieval-augmented generation and customer experience, so that I can utilize these instructions to improve the performance and reliability of AI applications.
Success definition: Given I am on the Maihem documentation page When I click on links to 'Test RAG and hallucinations' and 'Test customer experience' Then I should see detailed instructions for testing retrieval-augmented generation applications and customer interactions.
Trajectory
Step 1:
Url (before/after):
https://maihem.ai/
https://docs.maihem.ai/get-started/what-is-maihem
Content (before/after):
RootWebArea MAIHEM, focused, url='https://maihem.ai/'
banner
[42] link home, center=(368,66), url='https://maihem.ai/'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ea833d4dfeeddb12412891_MAIHEM_Cool%20White_Full_Logo.svg'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66fc56066ce61b982b20bc66_MAIHEM_Bittersweet_Full_Logo.svg'
Show more
navigation
[49] link Documentation, center=(1208,66), url='https://docs.maihem.ai/quickstart'
[51] link About, center=(1328,66), url='https://maihem.ai/about'
[53] link Careers, center=(1418,66), url='https://jobs.ashbyhq.com/maihem'
[55] link Blog, center=(1501,66), url='https://maihem.ai/news-insights'
[56] link Login, center=(1588,66), url='https://cause.maihem.ai/'
main
heading Enterprise-grade testing for AI applications.
strong
StaticText Maihem empowers technology leaders and engineering teams to confidently deploy AI at scale with automated testing, monitoring, and reporting that ensures compliance with company AI requirements.
[143] link Book a demo, center=(865,515), url='https://cal.com/team/maihem/intro-to-maihem'
[144] link See documentation, center=(1030,515), url='https://docs.maihem.ai/quickstart'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/67196075d98a1d9d22bafb11_MH-Demo-Test-p-1600.jpg'
StaticText FEATURED IN
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46ed7dd81f4ed60ef8c_ycombinator-ar21%201.svg'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46e1b035e334b7e6f20_The-Wall-Street-Journal-Logo-2%201.svg'
StaticText BUILT BY AI RESEARCHERS FROM WORLD LEADING INSTITUTIONS
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46f402b9983eb8fd9c7_MIT%20-%20Massachusetts%20Institute%20of%20Technology-0%201.svg'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46ec43f63a11f23b6fa_The%20Oxford%20logo%20_%20Communications%20Hub-1%201.svg'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46e6cf04a26595f5d85_Imperial%20College%20London-0%201.svg'
heading Build enterprise-grade AI with confidence
link See documentation, url='https://docs.maihem.ai/quickstart'
heading Catch critical flaws before your users do
paragraph
StaticText Systematically test and monitor the performance of your AI application using our industry-leading metrics libraries.
StaticText
heading Scale reviews with expert human-in-the-loop collaboration
paragraph
StaticText Easily collaborate with your team members and leverage Maihem's expert human-in-the-loop platform to achieve highest-quality product reviews at scale.
heading Your agent, certified by Maihem
paragraph
StaticText Share detailed, expert-reviewed reports on the performance of your AI application with your customers and stakeholders.
heading Features
StaticText AI QUALITY ASSURANCE SUITE
StaticText 01
heading Customer experience (CX) test & track
StaticText Continuously test and monitor your AI application’s performance across diverse user personas and Role-Based Access Controls (RBAC).
StaticText AI QUALITY ASSURANCE SUITE
StaticText 02
heading RAG test & track
StaticText Ensure your AI application meets the highest information retrieval standards with the most advanced evaluation tools and hallucination detection models in the industry.
StaticText AI QUALITY ASSURANCE SUITE
StaticText 03
heading Agentic workflow simulations
StaticText Easily define and test any AI workflow to detect process flaws in your agentic architecture.
StaticText AI RISKS & SECURITY TESTING SUITE
StaticText 01
heading AI security test & track
StaticText Continuously assess your AI's security with our advanced red-teaming agents, designed to detect and address threats before they become critical.
StaticText AI RISKS & SECURITY TESTING SUITE
StaticText 02
heading Coverage across all OWASP dimensions of LLM risk
StaticText Protect your AI applications with in-depth tests covering all OWASP vulnerability and risk dimensions, providing comprehensive security insights.
StaticText AI RISKS & SECURITY TESTING SUITE
StaticText 03
heading Compliance tests for regulations such as GDPR and EU AI Act
StaticText Run rigorous simulations to test your AI application’s compliance with requirements such as under GDPR or the EU AI Act.
heading Frequently asked questions
list
listitem
StaticText How many simulations do I need to run to be safe?
paragraph
StaticText With probabilistic and self-learning systems, it's less about an absolute number but more about continuous testing and supervision. Much like for us humans (who are also probabilistic systems). Continuous supervision, testing, and training is the key to excellence.
listitem
StaticText Which LLMs do you support?
paragraph
StaticText Our system is LLM agnostic. Whether you’re using OpenAI, Anthropic, Cohere, Google, or any open-source model, we can assess your AI application’s performance and even help you benchmark the best LLM option for your use case.
listitem
StaticText Do you offer custom solutions?
paragraph
StaticText Yes, we provide custom enterprise solutions tailored to your organization, tech stack,
and specific AI use case.
listitem
StaticText Is our data secure when you test our AI?
paragraph
StaticText Yes. All our systems are designed with bank/military-grade IT security standards. All data is encrypted in transit (TLS) and at rest (AES256). Dual-layer network boundary protection is in place. We offer various ways to integrate with us, to ensure we accommodate your data and IT security requirements.
listitem
StaticText I love your mission. Can I join the team?
paragraph
StaticText We’d be thrilled! Check out our careers page for open positions—we can’t wait to meet you.
heading News and insights
link View all, url='https://maihem.ai/news-insights'
list
listitem
link 10 Tips to Improve Your RAG System, url='https://maihem.ai/articles/10-tips-to-improve-your-rag-system'
list
listitem
StaticText Hallucinations & RAG Systems
StaticText Our Map-Reduce inspired fact checking system.
link Read More, url='https://maihem.ai/articles/detecting-hallucinations-in-retrieval-augmented-generation-rag-systems-a-two-pass-approach'
listitem
StaticText OWASP Tests for Critical LLM Vulnerabilities
StaticText OWASP Top 10 for LLMs: New Risks, New Testing Methods.
link Read More, url='https://maihem.ai/articles/how-to-test-the-owasp-top-10-critical-vulnerabilities-for-llms'
listitem
StaticText Maihem mentioned in the Wall Street Journal
StaticText Our recent mention in the WSJ.
link Read More, url='https://maihem.ai/articles/maihem-mentioned-in-the-wall-street-journal'
StaticText We help you build AI, responsibly
StaticText Book a call with our team to explore how Maihem can help you to build
StaticText and deploy AI responsibly and successfully in your organization.
link Book a call, url='https://cal.com/team/maihem/intro-to-maihem'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ea833d4dfeeddb12412891_MAIHEM_Cool%20White_Full_Logo.svg'
heading San Francisco
StaticText 2261 Market Street STE 5732
StaticText San Francisco
StaticText California 94114
StaticText United States
heading London
StaticText 1 Lyric Square
StaticText W6 0NB
StaticText London
StaticText United Kingdom
link About, url='https://maihem.ai/about'
link Blog, url='https://maihem.ai/news-insights'
link Contact, url='mailto:contact@maihem.ai?subject=Get%20in%20Touch'
link Book a demo, url='https://cal.com/team/maihem/meet-maihem'
link Terms of use, url='https://maihem.ai/terms-of-service'
link Privacy policy, url='https://maihem.ai/privacy-policy'
link Cookie settings, url='https://maihem.ai/#'
StaticText Contact:
link contact@maihem.ai, url='https://maihem.ai/#'
link, url='https://www.linkedin.com/company/maihem/'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec00124fed91f865aa45cf_Vectors-Wrapper.svg'
link, url='https://x.com/maihem_ai'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec001d0d29c296007d3acc_Vectors-Wrapper.svg'
link, url='https://www.instagram.com/maihem.ai/'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec00234d68bd95b7aef3c2_Vectors-Wrapper.svg'
link, url='https://github.com/maihemlabs'
image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec0028f8cfe84a2af87407_Vectors-Wrapper.svg'
StaticText © 2024 Maihem
StaticText Design by
StaticText Shoreditch Design
RootWebArea What is Maihem - Maihem, focused, url='https://docs.maihem.ai/get-started/what-is-maihem'
main
[68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
[74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
image
navigation
list
listitem
[83] link Home, center=(1286,32), url='https://maihem.ai/'
listitem
[85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
listitem
[87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
listitem
[91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
[97] button Toggle dark mode, center=(1632,32)
image
heading Get started
list
listitem
[125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
image
listitem
[130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
image
listitem
[135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
image
heading How-to guides
list
listitem
[143] link Test RAG and hallucinations, center=(384,322), url='https://docs.maihem.ai/guides/test/rag'
image
listitem
[148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
image
listitem
[153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
image
heading Reference
list
listitem
[161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
image
listitem
[166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
image
listitem
[171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
image
StaticText Get started
heading What is Maihem
paragraph
StaticText Maihem automatically tests your AI applications (target agents), from simple LLM applications to complex agentic workflows.
[189] link Quickstart Set up Maihem and run your first test in 5 minutes, center=(996,321), inner_text=Quickstart
Set up Maihem and run your first test in 5 minutes, url='https://docs.maihem.ai/get-started/quickstart'
image
heading Quickstart
paragraph
StaticText Set up Maihem and run your first test in 5 minutes
[198] link How-to guides Find examples to run custom tests, center=(787,427), inner_text=How-to guides
Find examples to run custom tests, url='https://docs.maihem.ai/guides'
image
heading How-to guides
paragraph
StaticText Find examples to run custom tests
[206] link Reference Read our detailed documentation, center=(1204,427), inner_text=Reference
Read our detailed documentation, url='https://docs.maihem.ai/reference/core-concepts/summary'
image
heading Reference
paragraph
StaticText Read our detailed documentation
heading Navigate to header Test modules and metrics
[216] link Navigate to header, center=(567,544), inner_text=, url='https://docs.maihem.ai/get-started/what-is-maihem#test-modules-and-metrics'
StaticText
image
separator, orientation='horizontal'
[219] span, center=(729,544), inner_text=Test modules and metrics
paragraph
StaticText Maihem enables you to test across many dimensions:
list
listitem
[224] link Customer experience (CX), center=(725,688), url='https://docs.maihem.ai/get-started/what-is-maihem#customer-experience-cx'
listitem
[226] link Retrieval-augmented generation (RAG), center=(773,724), url='https://docs.maihem.ai/get-started/what-is-maihem#retrieval-augmented-generation-rag'
listitem
strong
StaticText Safety
StaticText (
[229] link Bias, center=(699,760), url='https://docs.maihem.ai/get-started/what-is-maihem#bias'
StaticText ,
[230] link Brand reputation, center=(794,760), url='https://docs.maihem.ai/get-started/what-is-maihem#brand-reputation'
StaticText ,
[231] link Toxicity, center=(902,760), url='https://docs.maihem.ai/get-started/what-is-maihem#toxicity'
StaticText )
listitem
strong
StaticText Security
StaticText (
[234] link Overreach, center=(739,796), url='https://docs.maihem.ai/get-started/what-is-maihem#overreach'
StaticText ,
[235] link Privacy (PII), center=(839,796), url='https://docs.maihem.ai/get-started/what-is-maihem#privacy-pii'
StaticText ,
[236] link System access, center=(955,796), url='https://docs.maihem.ai/get-started/what-is-maihem#system-access'
StaticText )
heading Navigate to header How it works
[239] link Navigate to header, center=(567,873), inner_text=, url='https://docs.maihem.ai/get-started/what-is-maihem#how-it-works'
StaticText
image
separator, orientation='horizontal'
[242] span, center=(660,873), inner_text=How it works
list
listitem
StaticText 1
paragraph
StaticText Simulate conversations
paragraph
StaticText Simulate conversations between your target Agent and our Maihem agents
listitem
StaticText 2
paragraph
StaticText Evaluate
paragraph
StaticText Evaluate your target agent and find failures
listitem
StaticText 3
paragraph
StaticText Get action list
paragraph
StaticText Get an action list to fix detected failures
listitem
StaticText 4
paragraph
StaticText Improve
paragraph
StaticText Iterate and improve 10x faster
link Get Maihem SDK and API key, url='https://docs.maihem.ai/get-started/get-maihem'
image
link linkedin, url='https://www.linkedin.com/company/maihem'
image
link instagram, url='https://www.instagram.com/maihem.ai/'
image
link twitter, url='https://twitter.com/maihem_ai'
image
link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
alert, atomic
Reasoning for action: I clicked on the 'Documentation' link as this is the starting point to access the information necessary for testing retrieval-augmented generation and customer experience. By navigating to the documentation section, I will be able to locate and then click on the specific links related to 'Test RAG and hallucinations' and 'Test customer experience.' This step is essential for fulfilling the user story and achieving the acceptance criteria.
Action: click(49) # Click on the Documentation link to explore the documentation section.
Change Description: After clicking the Documentation link, the browser navigated to the Maihem documentation page located at 'https://docs.maihem.ai/get-started/what-is-maihem', replacing the previous content. The new page features sections such as 'Get started', 'How-to guides', and 'Reference', along with links to resources and a search functionality, indicating a shift from the main site to a dedicated documentation environment.
Screenshot before:Screenshot after:---
Step 2:
Url (before/after):
https://docs.maihem.ai/get-started/what-is-maihem
https://docs.maihem.ai/guides/test/rag
Content (before/after):
RootWebArea What is Maihem - Maihem, focused, url='https://docs.maihem.ai/get-started/what-is-maihem'
main
[68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
[74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
image
navigation
list
listitem
[83] link Home, center=(1286,32), url='https://maihem.ai/'
listitem
[85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
listitem
[87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
listitem
[91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
[97] button Toggle dark mode, center=(1632,32)
image
heading Get started
list
listitem
[125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
image
listitem
[130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
image
listitem
[135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
image
heading How-to guides
list
listitem
[143] link Test RAG and hallucinations, center=(384,322), url='https://docs.maihem.ai/guides/test/rag'
image
listitem
[148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
image
listitem
[153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
image
heading Reference
list
listitem
[161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
image
listitem
[166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
image
listitem
[171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
image
StaticText Get started
heading What is Maihem
paragraph
StaticText Maihem automatically tests your AI applications (target agents), from simple LLM applications to complex agentic workflows.
[189] link Quickstart Set up Maihem and run your first test in 5 minutes, center=(996,321), inner_text=Quickstart
Set up Maihem and run your first test in 5 minutes, url='https://docs.maihem.ai/get-started/quickstart'
image
heading Quickstart
paragraph
StaticText Set up Maihem and run your first test in 5 minutes
[198] link How-to guides Find examples to run custom tests, center=(787,427), inner_text=How-to guides
Find examples to run custom tests, url='https://docs.maihem.ai/guides'
image
heading How-to guides
paragraph
StaticText Find examples to run custom tests
[206] link Reference Read our detailed documentation, center=(1204,427), inner_text=Reference
Read our detailed documentation, url='https://docs.maihem.ai/reference/core-concepts/summary'
image
heading Reference
paragraph
StaticText Read our detailed documentation
heading Navigate to header Test modules and metrics
[216] link Navigate to header, center=(567,544), inner_text=, url='https://docs.maihem.ai/get-started/what-is-maihem#test-modules-and-metrics'
StaticText
image
separator, orientation='horizontal'
[219] span, center=(729,544), inner_text=Test modules and metrics
paragraph
StaticText Maihem enables you to test across many dimensions:
list
listitem
[224] link Customer experience (CX), center=(725,688), url='https://docs.maihem.ai/get-started/what-is-maihem#customer-experience-cx'
listitem
[226] link Retrieval-augmented generation (RAG), center=(773,724), url='https://docs.maihem.ai/get-started/what-is-maihem#retrieval-augmented-generation-rag'
listitem
strong
StaticText Safety
StaticText (
[229] link Bias, center=(699,760), url='https://docs.maihem.ai/get-started/what-is-maihem#bias'
StaticText ,
[230] link Brand reputation, center=(794,760), url='https://docs.maihem.ai/get-started/what-is-maihem#brand-reputation'
StaticText ,
[231] link Toxicity, center=(902,760), url='https://docs.maihem.ai/get-started/what-is-maihem#toxicity'
StaticText )
listitem
strong
StaticText Security
StaticText (
[234] link Overreach, center=(739,796), url='https://docs.maihem.ai/get-started/what-is-maihem#overreach'
StaticText ,
[235] link Privacy (PII), center=(839,796), url='https://docs.maihem.ai/get-started/what-is-maihem#privacy-pii'
StaticText ,
[236] link System access, center=(955,796), url='https://docs.maihem.ai/get-started/what-is-maihem#system-access'
StaticText )
heading Navigate to header How it works
[239] link Navigate to header, center=(567,873), inner_text=, url='https://docs.maihem.ai/get-started/what-is-maihem#how-it-works'
StaticText
image
separator, orientation='horizontal'
[242] span, center=(660,873), inner_text=How it works
list
listitem
StaticText 1
paragraph
StaticText Simulate conversations
paragraph
StaticText Simulate conversations between your target Agent and our Maihem agents
listitem
StaticText 2
paragraph
StaticText Evaluate
paragraph
StaticText Evaluate your target agent and find failures
listitem
StaticText 3
paragraph
StaticText Get action list
paragraph
StaticText Get an action list to fix detected failures
listitem
StaticText 4
paragraph
StaticText Improve
paragraph
StaticText Iterate and improve 10x faster
link Get Maihem SDK and API key, url='https://docs.maihem.ai/get-started/get-maihem'
image
link linkedin, url='https://www.linkedin.com/company/maihem'
image
link instagram, url='https://www.instagram.com/maihem.ai/'
image
link twitter, url='https://twitter.com/maihem_ai'
image
link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
alert, atomic
RootWebArea Test RAG and hallucinations - Maihem, focused, url='https://docs.maihem.ai/guides/test/rag'
main
[68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
[74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
image
navigation
list
listitem
[83] link Home, center=(1286,32), url='https://maihem.ai/'
listitem
[85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
listitem
[87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
listitem
[91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
[97] button Toggle dark mode, center=(1632,32)
image
heading Get started
list
listitem
[125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
image
listitem
[130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
image
listitem
[135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
image
heading How-to guides
list
listitem
[143] link Test RAG and hallucinations, center=(384,322), focused, url='https://docs.maihem.ai/guides/test/rag'
image
listitem
[148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
image
listitem
[153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
image
heading Reference
list
listitem
[161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
image
listitem
[166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
image
listitem
[171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
image
StaticText How-to guides
heading Test RAG and hallucinations
paragraph
StaticText Generate questions from your documents to test your RAG application for hallucinations, answer relevance, and context retrieval efficiency
image Info
paragraph
[339] link Get your Maihem API key and install the SDK, center=(794,295), url='https://docs.maihem.ai/get-started/get-maihem'
StaticText before you start.
list
listitem
[342] button Python, center=(612,364)
listitem
[344] button Command line interface (CLI), center=(764,364)
list
listitem
StaticText 1
heading Create target agent (if you haven't already)
[356] button Copy the contents from the code block, center=(1373,516)
image
code
StaticText from
StaticText maihem
StaticText import
StaticText Maihem
StaticText maihem_client
StaticText =
StaticText Maihem
StaticText (
StaticText )
StaticText maihem_client
StaticText .
StaticText create_target_agent
StaticText (
StaticText name
StaticText =
StaticText "rag_financial_assistant"
StaticText ,
StaticText label
StaticText =
StaticText "POC stock market agent"
StaticText ,
StaticText # Optional
StaticText role
StaticText =
StaticText "AI Financial Assistant"
StaticText ,
StaticText description
StaticText =
StaticText "An AI assistant that provides information and summaries from financial documents."
StaticText language
StaticText =
StaticText "en"
StaticText # Optional, default is "en" (English), follow ISO 639
StaticText )
listitem
StaticText 2
heading Add documents to generate questions from them
paragraph
StaticText Maihem supports documents in the following formats: pdf, txt, docx, md.
paragraph
StaticText Move all the documents to the same folder.
[406] button Copy the contents from the code block, center=(1373,1012)
image
code
StaticText documents_path
StaticText =
StaticText "/path/to/folder/with/documents"
listitem
StaticText 3
heading Create a test
paragraph
StaticText Create a RAG test by using the
code
StaticText RAG module
StaticText . A module defines the scope of a test.
paragraph
StaticText A RAG test will generate a random set of questions from your documents to test your target agent, and evaluate these metrics:
list
listitem
StaticText Answer relevance
listitem
StaticText Context retrieval efficiency
listitem
StaticText Hallucinations
paragraph
link See more detailed documentation on metrics and modules, url='https://docs.maihem.ai/reference/metric-collection'
StaticText .
button Copy the contents from the code block
image
code
StaticText from
StaticText maihem
StaticText import
StaticText Maihem
StaticText maihem_client
StaticText =
StaticText Maihem
StaticText (
StaticText )
StaticText maihem_client
StaticText .
StaticText create_test
StaticText (
StaticText name
StaticText =
StaticText "rag_test_1"
StaticText ,
StaticText label
StaticText =
StaticText "RAG test #1"
StaticText ,
StaticText # Optional
StaticText target_agent_name
StaticText =
StaticText "rag_financial_assistant"
StaticText ,
StaticText initiating_agent
StaticText =
StaticText "maihem"
StaticText # or "target"
StaticText modules
StaticText =
StaticText [
StaticText "rag"
StaticText ]
StaticText ,
StaticText documents_path
StaticText =
StaticText documents_path
StaticText # Path to folder with documents
StaticText number_conversations
StaticText =
StaticText 50
StaticText ,
StaticText conversation_turns_max
StaticText =
StaticText 5
StaticText # Optional, default is 10
StaticText )
listitem
StaticText 4
heading Connect target agent with wrapper function
paragraph
StaticText Modify the following function to wrap your target agent:
button Copy the contents from the code block
image
code
StaticText from
StaticText typing
StaticText import
StaticText Tuple
StaticText ,
StaticText List
StaticText ,
StaticText Dict
StaticText def
StaticText wrapper_function
StaticText (
StaticText conversation_id
StaticText :
StaticText str
StaticText ,
StaticText # Keep track of different conversations with conversation_id
StaticText maihem_agent_message
StaticText :
StaticText str
StaticText ,
StaticText # The message from Maihem
StaticText conversation_history
StaticText :
StaticText Dict
StaticText # Auxiliary dictionary to store conversation history (if needed)
StaticText )
StaticText -
StaticText >
StaticText Tuple
StaticText [
StaticText str
StaticText ,
StaticText List
StaticText [
StaticText str
StaticText ]
StaticText ]
StaticText :
StaticText """Callable wrapper function to wrap your target agent to be tested."""
StaticText # Replace with the message from your target agent
StaticText target_agent_message
StaticText =
StaticText "Hi, how can I help you?"
StaticText # If target initiates conversation, first maihem_agent_message is None
StaticText # (Optional) add messages to conversation history
StaticText conversation_history
StaticText [
StaticText conversation_id
StaticText ]
StaticText .
StaticText append
StaticText (
StaticText {
StaticText "role"
StaticText :
StaticText "maihem"
StaticText ,
StaticText "content"
StaticText :
StaticText maihem_agent_message
StaticText }
StaticText )
StaticText conversation_history
StaticText [
StaticText conversation_id
StaticText ]
StaticText .
StaticText append
StaticText (
StaticText {
StaticText "role"
StaticText :
StaticText "target"
StaticText ,
StaticText "content"
StaticText :
StaticText target_agent_message
StaticText }
StaticText )
StaticText # List of retrieved contexts for RAG evaluations
StaticText contexts
StaticText =
StaticText [
StaticText "Context_1"
StaticText ,
StaticText "Context_2"
StaticText ]
StaticText return
StaticText target_agent_message
StaticText ,
StaticText contexts
listitem
StaticText 5
heading Run the test
paragraph
StaticText A
code
StaticText test run
StaticText will generate:
list
listitem
StaticText Simulated conversations between your target agent and Maihem
listitem
StaticText Evaluations of the conversations
listitem
StaticText A list of detected failures
button Copy the contents from the code block
image
code
StaticText maihem_client
StaticText .
StaticText run_test
StaticText (
StaticText name
StaticText =
StaticText "modelX_prompt2.5_28-11-2024"
StaticText ,
StaticText label
StaticText =
StaticText "Model X Prompt v2.5 (28/Nov/2024)"
StaticText ,
StaticText # Optional
StaticText test_name
StaticText =
StaticText "rag_test_1"
StaticText ,
StaticText wrapper_function
StaticText =
StaticText wrapper_function
StaticText ,
StaticText # your wrapper function
StaticText concurrent_conversations
StaticText =
StaticText 10
StaticText # Optional
StaticText )
listitem
StaticText 6
heading See test run results
paragraph
link See the results in your Maihem account, url='https://cause.maihem.ai/'
StaticText .
paragraph
StaticText Or get the test results:
button Copy the contents from the code block
image
code
StaticText test_run_results
StaticText =
StaticText maihem_client
StaticText .
StaticText get_test_run_results
StaticText (
StaticText test_name
StaticText =
StaticText "rag_test_1"
StaticText ,
StaticText test_run_name
StaticText =
StaticText "modelX_prompt2.5_28-11-2024"
StaticText )
paragraph
code
StaticText test_run_results
StaticText contains the following information:
button Copy the contents from the code block
image
code
StaticText test_run_results
StaticText .
StaticText result
StaticText =
StaticText "failed"
StaticText test_run_results
StaticText .
StaticText score
StaticText =
StaticText 82.5
StaticText test_run_results
StaticText .
StaticText conversations
StaticText [
StaticText 0
StaticText ]
StaticText .
StaticText messages
StaticText =
StaticText [
StaticText {
StaticText "role"
StaticText :
StaticText "maihem"
StaticText ,
StaticText "content"
StaticText :
StaticText "When was Fund X created?"
StaticText }
StaticText ,
StaticText {
StaticText "role"
StaticText :
StaticText "target"
StaticText ,
StaticText "content"
StaticText :
StaticText "Sorry, I could not find this information."
StaticText ,
StaticText "evaluation"
StaticText :
StaticText {
StaticText "is_failure"
StaticText :
StaticText True
StaticText ,
StaticText "explanation"
StaticText :
StaticText "Hallucination detected. Fund X was created in 2005."
StaticText }
StaticText }
StaticText ]
StaticText test_run_failures
StaticText =
StaticText [
StaticText ]
link Quickstart, url='https://docs.maihem.ai/get-started/quickstart'
image
link Test customer experience, url='https://docs.maihem.ai/guides/test/cx'
image
link linkedin, url='https://www.linkedin.com/company/maihem'
image
link instagram, url='https://www.instagram.com/maihem.ai/'
image
link twitter, url='https://twitter.com/maihem_ai'
image
link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
alert, atomic
StaticText Test RAG and hallucinations - Maihem
Reasoning for action: I selected the 'Test RAG and hallucinations' link because it is directly related to the user's goal of exploring instructions for testing retrieval-augmented generation. By clicking this link, I will access the specific content needed to understand how to implement and improve performance in AI applications, thereby progressing towards the objective.
Action: click(143) # Click on the 'Test RAG and hallucinations' link to explore how to test retrieval-augmented generation.
Change Description: The browser navigated to the 'Test RAG and hallucinations' page, updating the main content to provide detailed instructions on generating questions for testing retrieval-augmented generation applications, including steps for creating target agents and running tests.
Screenshot before:Screenshot after:---
Step 3:
Url (before/after):
https://docs.maihem.ai/guides/test/rag
https://docs.maihem.ai/guides/test/cx
Content (before/after):
RootWebArea Test RAG and hallucinations - Maihem, focused, url='https://docs.maihem.ai/guides/test/rag'
main
[68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
[74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
image
navigation
list
listitem
[83] link Home, center=(1286,32), url='https://maihem.ai/'
listitem
[85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
listitem
[87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
listitem
[91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
[97] button Toggle dark mode, center=(1632,32)
image
heading Get started
list
listitem
[125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
image
listitem
[130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
image
listitem
[135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
image
heading How-to guides
list
listitem
[143] link Test RAG and hallucinations, center=(384,322), focused, url='https://docs.maihem.ai/guides/test/rag'
image
listitem
[148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
image
listitem
[153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
image
heading Reference
list
listitem
[161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
image
listitem
[166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
image
listitem
[171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
image
StaticText How-to guides
heading Test RAG and hallucinations
paragraph
StaticText Generate questions from your documents to test your RAG application for hallucinations, answer relevance, and context retrieval efficiency
image Info
paragraph
[339] link Get your Maihem API key and install the SDK, center=(794,295), url='https://docs.maihem.ai/get-started/get-maihem'
StaticText before you start.
list
listitem
[342] button Python, center=(612,364)
listitem
[344] button Command line interface (CLI), center=(764,364)
list
listitem
StaticText 1
heading Create target agent (if you haven't already)
[356] button Copy the contents from the code block, center=(1373,516)
image
code
StaticText from
StaticText maihem
StaticText import
StaticText Maihem
StaticText maihem_client
StaticText =
StaticText Maihem
StaticText (
StaticText )
StaticText maihem_client
StaticText .
StaticText create_target_agent
StaticText (
StaticText name
StaticText =
StaticText "rag_financial_assistant"
StaticText ,
StaticText label
StaticText =
StaticText "POC stock market agent"
StaticText ,
StaticText # Optional
StaticText role
StaticText =
StaticText "AI Financial Assistant"
StaticText ,
StaticText description
StaticText =
StaticText "An AI assistant that provides information and summaries from financial documents."
StaticText language
StaticText =
StaticText "en"
StaticText # Optional, default is "en" (English), follow ISO 639
StaticText )
listitem
StaticText 2
heading Add documents to generate questions from them
paragraph
StaticText Maihem supports documents in the following formats: pdf, txt, docx, md.
paragraph
StaticText Move all the documents to the same folder.
[406] button Copy the contents from the code block, center=(1373,1012)
image
code
StaticText documents_path
StaticText =
StaticText "/path/to/folder/with/documents"
listitem
StaticText 3
heading Create a test
paragraph
StaticText Create a RAG test by using the
code
StaticText RAG module
StaticText . A module defines the scope of a test.
paragraph
StaticText A RAG test will generate a random set of questions from your documents to test your target agent, and evaluate these metrics:
list
listitem
StaticText Answer relevance
listitem
StaticText Context retrieval efficiency
listitem
StaticText Hallucinations
paragraph
link See more detailed documentation on metrics and modules, url='https://docs.maihem.ai/reference/metric-collection'
StaticText .
button Copy the contents from the code block
image
code
StaticText from
StaticText maihem
StaticText import
StaticText Maihem
StaticText maihem_client
StaticText =
StaticText Maihem
StaticText (
StaticText )
StaticText maihem_client
StaticText .
StaticText create_test
StaticText (
StaticText name
StaticText =
StaticText "rag_test_1"
StaticText ,
StaticText label
StaticText =
StaticText "RAG test #1"
StaticText ,
StaticText # Optional
StaticText target_agent_name
StaticText =
StaticText "rag_financial_assistant"
StaticText ,
StaticText initiating_agent
StaticText =
StaticText "maihem"
StaticText # or "target"
StaticText modules
StaticText =
StaticText [
StaticText "rag"
StaticText ]
StaticText ,
StaticText documents_path
StaticText =
StaticText documents_path
StaticText # Path to folder with documents
StaticText number_conversations
StaticText =
StaticText 50
StaticText ,
StaticText conversation_turns_max
StaticText =
StaticText 5
StaticText # Optional, default is 10
StaticText )
listitem
StaticText 4
heading Connect target agent with wrapper function
paragraph
StaticText Modify the following function to wrap your target agent:
button Copy the contents from the code block
image
code
StaticText from
StaticText typing
StaticText import
StaticText Tuple
StaticText ,
StaticText List
StaticText ,
StaticText Dict
StaticText def
StaticText wrapper_function
StaticText (
StaticText conversation_id
StaticText :
StaticText str
StaticText ,
StaticText # Keep track of different conversations with conversation_id
StaticText maihem_agent_message
StaticText :
StaticText str
StaticText ,
StaticText # The message from Maihem
StaticText conversation_history
StaticText :
StaticText Dict
StaticText # Auxiliary dictionary to store conversation history (if needed)
StaticText )
StaticText -
StaticText >
StaticText Tuple
StaticText [
StaticText str
StaticText ,
StaticText List
StaticText [
StaticText str
StaticText ]
StaticText ]
StaticText :
StaticText """Callable wrapper function to wrap your target agent to be tested."""
StaticText # Replace with the message from your target agent
StaticText target_agent_message
StaticText =
StaticText "Hi, how can I help you?"
StaticText # If target initiates conversation, first maihem_agent_message is None
StaticText # (Optional) add messages to conversation history
StaticText conversation_history
StaticText [
StaticText conversation_id
StaticText ]
StaticText .
StaticText append
StaticText (
StaticText {
StaticText "role"
StaticText :
StaticText "maihem"
StaticText ,
StaticText "content"
StaticText :
StaticText maihem_agent_message
StaticText }
StaticText )
StaticText conversation_history
StaticText [
StaticText conversation_id
StaticText ]
StaticText .
StaticText append
StaticText (
StaticText {
StaticText "role"
StaticText :
StaticText "target"
StaticText ,
StaticText "content"
StaticText :
StaticText target_agent_message
StaticText }
StaticText )
StaticText # List of retrieved contexts for RAG evaluations
StaticText contexts
StaticText =
StaticText [
StaticText "Context_1"
StaticText ,
StaticText "Context_2"
StaticText ]
StaticText return
StaticText target_agent_message
StaticText ,
StaticText contexts
listitem
StaticText 5
heading Run the test
paragraph
StaticText A
code
StaticText test run
StaticText will generate:
list
listitem
StaticText Simulated conversations between your target agent and Maihem
listitem
StaticText Evaluations of the conversations
listitem
StaticText A list of detected failures
button Copy the contents from the code block
image
code
StaticText maihem_client
StaticText .
StaticText run_test
StaticText (
StaticText name
StaticText =
StaticText "modelX_prompt2.5_28-11-2024"
StaticText ,
StaticText label
StaticText =
StaticText "Model X Prompt v2.5 (28/Nov/2024)"
StaticText ,
StaticText # Optional
StaticText test_name
StaticText =
StaticText "rag_test_1"
StaticText ,
StaticText wrapper_function
StaticText =
StaticText wrapper_function
StaticText ,
StaticText # your wrapper function
StaticText concurrent_conversations
StaticText =
StaticText 10
StaticText # Optional
StaticText )
listitem
StaticText 6
heading See test run results
paragraph
link See the results in your Maihem account, url='https://cause.maihem.ai/'
StaticText .
paragraph
StaticText Or get the test results:
button Copy the contents from the code block
image
code
StaticText test_run_results
StaticText =
StaticText maihem_client
StaticText .
StaticText get_test_run_results
StaticText (
StaticText test_name
StaticText =
StaticText "rag_test_1"
StaticText ,
StaticText test_run_name
StaticText =
StaticText "modelX_prompt2.5_28-11-2024"
StaticText )
paragraph
code
StaticText test_run_results
StaticText contains the following information:
button Copy the contents from the code block
image
code
StaticText test_run_results
StaticText .
StaticText result
StaticText =
StaticText "failed"
StaticText test_run_results
StaticText .
StaticText score
StaticText =
StaticText 82.5
StaticText test_run_results
StaticText .
StaticText conversations
StaticText [
StaticText 0
StaticText ]
StaticText .
StaticText messages
StaticText =
StaticText [
StaticText {
StaticText "role"
StaticText :
StaticText "maihem"
StaticText ,
StaticText "content"
StaticText :
StaticText "When was Fund X created?"
StaticText }
StaticText ,
StaticText {
StaticText "role"
StaticText :
StaticText "target"
StaticText ,
StaticText "content"
StaticText :
StaticText "Sorry, I could not find this information."
StaticText ,
StaticText "evaluation"
StaticText :
StaticText {
StaticText "is_failure"
StaticText :
StaticText True
StaticText ,
StaticText "explanation"
StaticText :
StaticText "Hallucination detected. Fund X was created in 2005."
StaticText }
StaticText }
StaticText ]
StaticText test_run_failures
StaticText =
StaticText [
StaticText ]
link Quickstart, url='https://docs.maihem.ai/get-started/quickstart'
image
link Test customer experience, url='https://docs.maihem.ai/guides/test/cx'
image
link linkedin, url='https://www.linkedin.com/company/maihem'
image
link instagram, url='https://www.instagram.com/maihem.ai/'
image
link twitter, url='https://twitter.com/maihem_ai'
image
link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
alert, atomic
StaticText Test RAG and hallucinations - Maihem
RootWebArea Test customer experience - Maihem, focused, url='https://docs.maihem.ai/guides/test/cx'
main
[68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
[74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
image
navigation
list
listitem
[83] link Home, center=(1286,32), url='https://maihem.ai/'
listitem
[85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
listitem
[87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
listitem
[91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
[97] button Toggle dark mode, center=(1632,32)
image
heading Get started
list
listitem
[125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
image
listitem
[130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
image
listitem
[135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
image
heading How-to guides
list
listitem
[143] link Test RAG and hallucinations, center=(384,322), url='https://docs.maihem.ai/guides/test/rag'
image
listitem
[148] link Test customer experience, center=(384,358), focused, url='https://docs.maihem.ai/guides/test/cx'
image
listitem
[153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
image
heading Reference
list
listitem
[161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
image
listitem
[166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
image
listitem
[171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
image
StaticText How-to guides
heading Test customer experience
paragraph
StaticText Simulate a population of personas to test your AI application
image Info
paragraph
[809] link Get your Maihem API key and install the SDK, center=(794,267), url='https://docs.maihem.ai/get-started/get-maihem'
StaticText before you start.
list
listitem
[812] button Python, center=(612,336)
listitem
[814] button Command line interface (CLI), center=(764,336)
list
listitem
StaticText 1
heading Create target agent (if you haven't already)
[826] button Copy the contents from the code block, center=(1373,488)
image
code
StaticText from
StaticText maihem
StaticText import
StaticText Maihem
StaticText maihem_client
StaticText =
StaticText Maihem
StaticText (
StaticText )
StaticText maihem_client
StaticText .
StaticText create_target_agent
StaticText (
StaticText name
StaticText =
StaticText "history_tutor"
StaticText ,
StaticText label
StaticText =
StaticText "AI History Tutor"
StaticText ,
StaticText # Optional
StaticText role
StaticText =
StaticText "AI tutor"
StaticText ,
StaticText description
StaticText =
StaticText "A history tutor that helps students prepare for exams with practice questions and summaries"
StaticText ,
StaticText language
StaticText =
StaticText "en"
StaticText # Optional, default is "en" (English), follow ISO 639
StaticText )
listitem
StaticText 2
heading Create a test with 'cx' module
paragraph
StaticText Create a test using the
code
StaticText Customer Experience (CX) module
StaticText . A module defines the scope of a test.
paragraph
StaticText A test will simulate conversations with personas to test the customer experience of your target agent using these metrics:
list
[877] listitem, center=(1026,994), inner_text=Helpfulness
StaticText Helpfulness
[878] listitem, center=(1026,1030), inner_text=Goal completion
StaticText Goal completion
[879] listitem, center=(1026,1066), inner_text=Retention
StaticText Retention
listitem
StaticText Net promoter score (NPS)
paragraph
StaticText Guide the simulated conversations using prompts:
list
listitem
code
StaticText maihem_behavior_prompt
StaticText guides the behavior of the simulated personas
listitem
code
StaticText maihem_goal_prompt
StaticText describes the goal of the simulated personas
listitem
code
StaticText maihem_population_prompt
StaticText describes the desired population of simulated personas
button Copy the contents from the code block
image
code
StaticText from
StaticText maihem
StaticText import
StaticText Maihem
StaticText maihem_client
StaticText =
StaticText Maihem
StaticText (
StaticText )
StaticText maihem_client
StaticText .
StaticText create_test
StaticText (
StaticText name
StaticText =
StaticText "cx_test_1"
StaticText ,
StaticText label
StaticText =
StaticText "Customer experience Test #1"
StaticText ,
StaticText # Optional
StaticText target_agent_name
StaticText =
StaticText "history_tutor"
StaticText ,
StaticText initiating_agent
StaticText =
StaticText "maihem"
StaticText ,
StaticText # or "target"
StaticText modules
StaticText =
StaticText [
StaticText "cx"
StaticText ]
StaticText ,
StaticText maihem_behavior_prompt
StaticText =
StaticText "Request quizz-style questions and deep dives in bullet points for identified gaps"
StaticText ,
StaticText maihem_goal_prompt
StaticText =
StaticText "Prepare for exam tomorrow on the Industrial Revolution"
StaticText ,
StaticText maihem_population_prompt
StaticText =
StaticText "High school students, some care about school and want to ace the exam, some don't care and just want to pass"
StaticText ,
StaticText number_conversations
StaticText =
StaticText 20
StaticText ,
StaticText conversation_turns_max
StaticText =
StaticText 5
StaticText ,
StaticText )
listitem
StaticText 3
heading Connect target agent with wrapper function
paragraph
StaticText Modify the following function to wrap your target agent:
button Copy the contents from the code block
image
code
StaticText from
StaticText typing
StaticText import
StaticText Tuple
StaticText ,
StaticText List
StaticText ,
StaticText Dict
StaticText def
StaticText wrapper_function
StaticText (
StaticText conversation_id
StaticText :
StaticText str
StaticText ,
StaticText # Keep track of different conversations with conversation_id
StaticText maihem_agent_message
StaticText :
StaticText str
StaticText ,
StaticText # The message from Maihem
StaticText conversation_history
StaticText :
StaticText Dict
StaticText # Auxiliary dictionary to store conversation history (if needed)
StaticText )
StaticText -
StaticText >
StaticText Tuple
StaticText [
StaticText str
StaticText ,
StaticText List
StaticText [
StaticText str
StaticText ]
StaticText ]
StaticText :
StaticText """Callable wrapper function to wrap your target agent to be tested."""
StaticText # Replace with the message from your target agent
StaticText target_agent_message
StaticText =
StaticText "Hi, how can I help you?"
StaticText # If target initiates conversation, first maihem_agent_message is None
StaticText # (Optional) add messages to conversation history
StaticText conversation_history
StaticText [
StaticText conversation_id
StaticText ]
StaticText .
StaticText append
StaticText (
StaticText {
StaticText "role"
StaticText :
StaticText "maihem"
StaticText ,
StaticText "content"
StaticText :
StaticText maihem_agent_message
StaticText }
StaticText )
StaticText conversation_history
StaticText [
StaticText conversation_id
StaticText ]
StaticText .
StaticText append
StaticText (
StaticText {
StaticText "role"
StaticText :
StaticText "target"
StaticText ,
StaticText "content"
StaticText :
StaticText target_agent_message
StaticText }
StaticText )
StaticText # List of retrieved contexts for RAG evaluations
StaticText contexts
StaticText =
StaticText [
StaticText "Context_1"
StaticText ,
StaticText "Context_2"
StaticText ]
StaticText return
StaticText target_agent_message
StaticText ,
StaticText contexts
listitem
StaticText 4
heading Run the test
paragraph
StaticText A
code
StaticText test run
StaticText will generate:
list
listitem
StaticText Simulated conversations between your target agent and Maihem
listitem
StaticText Evaluations of the conversations
listitem
StaticText A list of detected failures
button Copy the contents from the code block
image
code
StaticText from
StaticText wrapper_function
StaticText import
StaticText wrapper_function
StaticText test_run_result
StaticText =
StaticText maihem_client
StaticText .
StaticText run_test
StaticText (
StaticText name
StaticText =
StaticText "modelX_prompt2.5_27-11-2024"
StaticText ,
StaticText label
StaticText =
StaticText "Model X Prompt v2.5 (27/Nov/2024)"
StaticText ,
StaticText # Optional
StaticText test_name
StaticText =
StaticText "cx_test_1"
StaticText ,
StaticText wrapper_function
StaticText =
StaticText wrapper_function
StaticText ,
StaticText concurrent_conversations
StaticText =
StaticText 10
StaticText # Optional
StaticText )
listitem
StaticText 5
heading See test run results
paragraph
link See the results in your Maihem account, url='https://cause.maihem.ai/'
StaticText .
paragraph
StaticText Or get the test results:
button Copy the contents from the code block
image
code
StaticText test_run_result
StaticText =
StaticText maihem_client
StaticText .
StaticText get_test_run_result
StaticText (
StaticText test_name
StaticText =
StaticText "cx_test_1"
StaticText ,
StaticText test_run_name
StaticText =
StaticText "modelX_prompt2.5_27-11-2024"
StaticText )
StaticText print
StaticText (
StaticText test_run_result
StaticText )
paragraph
code
StaticText test_run_results
StaticText contains the following information:
button Copy the contents from the code block
image
code
StaticText test_run_results
StaticText .
StaticText result
StaticText =
StaticText "failed"
StaticText test_run_results
StaticText .
StaticText score
StaticText =
StaticText 82.5
StaticText test_run_results
StaticText .
StaticText conversations
StaticText [
StaticText 0
StaticText ]
StaticText .
StaticText messages
StaticText =
StaticText [
StaticText {
StaticText "role"
StaticText :
StaticText "maihem"
StaticText ,
StaticText "content"
StaticText :
StaticText "Do you think I'm well prepared for my test tomorrow?"
StaticText }
StaticText ,
StaticText {
StaticText "role"
StaticText :
StaticText "target"
StaticText ,
StaticText "content"
StaticText :
StaticText "I can't answer that, I don't have that information"
StaticText ,
StaticText "evaluation"
StaticText :
StaticText {
StaticText "is_failure"
StaticText :
StaticText True
StaticText ,
StaticText "explanation"
StaticText :
StaticText "Goal not completed. The persona want to what are the topics she needs to still review for the test."
StaticText }
StaticText }
StaticText ]
StaticText test_run_failures
StaticText =
StaticText [
StaticText ]
link Test RAG and hallucinations, url='https://docs.maihem.ai/guides/test/rag'
image
link Monitor in production, url='https://docs.maihem.ai/guides/monitor/monitor'
image
link linkedin, url='https://www.linkedin.com/company/maihem'
image
link instagram, url='https://www.instagram.com/maihem.ai/'
image
link twitter, url='https://twitter.com/maihem_ai'
image
link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
alert, atomic
StaticText Test customer experience - Maihem
Reasoning for action: My reasoning for this action was to directly access the relevant guide for testing customer interactions, fulfilling part of the user's requirement to explore documentation on improving customer experience. By clicking on this link, I move closer to obtaining the detailed instructions needed to enhance performance and reliability in AI applications for customer interactions.
Action: click(148) # Click on the 'Test customer experience' link to explore how to test customer interactions.
Change Description: The browser changed from displaying the 'Test RAG and hallucinations' guide to the 'Test customer experience' guide upon clicking the respective link, updating the focused URL to 'https://docs.maihem.ai/guides/test/cx' and updating the visible content accordingly.
Screenshot before:Screenshot after:---