Skip to content

Task

User Story - Exploring Documentation

As a developer, I explore documentation for testing retrieval-augmented generation and customer experience, so that I can utilize these instructions to improve the performance and reliability of AI applications.

Success definition: Given I am on the Maihem documentation page When I click on links to 'Test RAG and hallucinations' and 'Test customer experience' Then I should see detailed instructions for testing retrieval-augmented generation applications and customer interactions.

Trajectory

Step 1:

Url (before/after):

https://maihem.ai/

https://docs.maihem.ai/get-started/what-is-maihem

Content (before/after):

RootWebArea MAIHEM, focused, url='https://maihem.ai/'
    banner
        [42] link home, center=(368,66), url='https://maihem.ai/'
            image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ea833d4dfeeddb12412891_MAIHEM_Cool%20White_Full_Logo.svg'
            image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66fc56066ce61b982b20bc66_MAIHEM_Bittersweet_Full_Logo.svg'
Show more
        navigation
            [49] link Documentation, center=(1208,66), url='https://docs.maihem.ai/quickstart'
            [51] link About, center=(1328,66), url='https://maihem.ai/about'
            [53] link Careers, center=(1418,66), url='https://jobs.ashbyhq.com/maihem'
            [55] link Blog, center=(1501,66), url='https://maihem.ai/news-insights'
            [56] link Login, center=(1588,66), url='https://cause.maihem.ai/'
    main
        heading Enterprise-grade testing for AI applications.
            strong
        StaticText Maihem empowers technology leaders and engineering teams to confidently deploy AI at scale with automated testing, monitoring, and reporting that ensures compliance with company AI requirements.
        [143] link Book a demo, center=(865,515), url='https://cal.com/team/maihem/intro-to-maihem'
        [144] link See documentation, center=(1030,515), url='https://docs.maihem.ai/quickstart'
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/67196075d98a1d9d22bafb11_MH-Demo-Test-p-1600.jpg'
        StaticText FEATURED IN
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46ed7dd81f4ed60ef8c_ycombinator-ar21%201.svg'
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46e1b035e334b7e6f20_The-Wall-Street-Journal-Logo-2%201.svg'
        StaticText BUILT BY AI RESEARCHERS FROM WORLD LEADING INSTITUTIONS
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46f402b9983eb8fd9c7_MIT%20-%20Massachusetts%20Institute%20of%20Technology-0%201.svg'
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46ec43f63a11f23b6fa_The%20Oxford%20logo%20_%20Communications%20Hub-1%201.svg'
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66eae46e6cf04a26595f5d85_Imperial%20College%20London-0%201.svg'
        heading Build enterprise-grade AI with confidence
        link See documentation, url='https://docs.maihem.ai/quickstart'
        heading Catch critical flaws before your users do
        paragraph
            StaticText Systematically test and monitor the performance of your AI application using our industry-leading metrics libraries.
            StaticText ‍
        heading Scale reviews with expert human-in-the-loop collaboration
        paragraph
            StaticText Easily collaborate with your team members and leverage Maihem's expert human-in-the-loop platform to achieve highest-quality product reviews at scale.
        heading Your agent, certified by Maihem
        paragraph
            StaticText Share detailed, expert-reviewed reports on the performance of your AI application with your customers and stakeholders.
        heading Features
        StaticText AI QUALITY ASSURANCE SUITE
        StaticText 01
        heading Customer experience (CX) test & track
        StaticText Continuously test and monitor your AI  application’s performance across diverse user personas and Role-Based Access Controls (RBAC).
        StaticText AI QUALITY ASSURANCE SUITE
        StaticText 02
        heading RAG test & track ‍
        StaticText Ensure your AI application meets the highest information retrieval standards with the most advanced evaluation tools and hallucination detection models in the industry.
        StaticText AI QUALITY ASSURANCE SUITE
        StaticText 03
        heading Agentic workflow simulations ‍
        StaticText Easily define and test any AI workflow to detect process flaws in your agentic architecture.
        StaticText AI RISKS & SECURITY TESTING SUITE
        StaticText 01
        heading AI security test & track ‍
        StaticText Continuously assess your AI's security with our advanced red-teaming agents, designed to detect and address threats before they become critical.
        StaticText AI RISKS & SECURITY TESTING SUITE
        StaticText 02
        heading Coverage across all OWASP dimensions of LLM risk
        StaticText Protect your AI applications with in-depth tests covering all OWASP vulnerability and risk dimensions, providing comprehensive security insights.
        StaticText AI RISKS & SECURITY TESTING SUITE
        StaticText 03
        heading Compliance tests for regulations such as GDPR and EU AI Act
        StaticText Run rigorous simulations to test your AI application’s compliance with requirements such as under GDPR or the EU AI Act.
        heading Frequently asked questions
        list
            listitem
                StaticText How many simulations do I need to run to be safe?
                paragraph
                    StaticText With probabilistic and self-learning systems, it's less about an absolute number but more about continuous testing and supervision. Much like for us humans (who are also probabilistic systems). Continuous supervision, testing, and training is the key to excellence.
            listitem
                StaticText Which LLMs do you support?
                paragraph
                    StaticText Our system is LLM agnostic. Whether you’re using OpenAI, Anthropic, Cohere, Google, or any open-source model, we can assess your AI application’s performance and even help you benchmark the best LLM option for your use case.
            listitem
                StaticText Do you offer custom solutions?
                paragraph
                    StaticText Yes, we provide custom enterprise solutions tailored to your organization, tech stack, 
and specific AI use case.
            listitem
                StaticText Is our data secure when you test our AI?
                paragraph
                    StaticText Yes. All our systems are designed with bank/military-grade IT security standards. All data is encrypted in transit (TLS) and at rest (AES256). Dual-layer network boundary protection is in place. We offer various ways to integrate with us, to ensure we accommodate your data and IT security requirements.
            listitem
                StaticText I love your mission. Can I join the team?
                paragraph
                    StaticText We’d be thrilled! Check out our careers page for open positions—we can’t wait to meet you.
        heading News and insights
        link View all, url='https://maihem.ai/news-insights'
        list
            listitem
                link 10 Tips to Improve Your RAG System, url='https://maihem.ai/articles/10-tips-to-improve-your-rag-system'
        list
            listitem
                StaticText Hallucinations & RAG Systems
                StaticText Our Map-Reduce inspired fact checking system.
                link Read More, url='https://maihem.ai/articles/detecting-hallucinations-in-retrieval-augmented-generation-rag-systems-a-two-pass-approach'
            listitem
                StaticText OWASP Tests for Critical LLM Vulnerabilities
                StaticText OWASP Top 10 for LLMs: New Risks, New Testing Methods.
                link Read More, url='https://maihem.ai/articles/how-to-test-the-owasp-top-10-critical-vulnerabilities-for-llms'
            listitem
                StaticText Maihem mentioned in the Wall Street Journal
                StaticText Our recent mention in the WSJ.
                link Read More, url='https://maihem.ai/articles/maihem-mentioned-in-the-wall-street-journal'
        StaticText We help you build AI, responsibly
        StaticText Book a call with our team to explore how Maihem can help you to build
        StaticText and deploy AI responsibly and successfully in your organization.
        link Book a call, url='https://cal.com/team/maihem/intro-to-maihem'
        image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ea833d4dfeeddb12412891_MAIHEM_Cool%20White_Full_Logo.svg'
        heading San Francisco
        StaticText 2261 Market Street STE 5732
        StaticText San Francisco
        StaticText California 94114
        StaticText United States
        heading London
        StaticText 1 Lyric Square
        StaticText W6 0NB
        StaticText London
        StaticText United Kingdom
        link About, url='https://maihem.ai/about'
        link Blog, url='https://maihem.ai/news-insights'
        link Contact, url='mailto:contact@maihem.ai?subject=Get%20in%20Touch'
        link Book a demo, url='https://cal.com/team/maihem/meet-maihem'
        link Terms of use, url='https://maihem.ai/terms-of-service'
        link Privacy policy, url='https://maihem.ai/privacy-policy'
        link Cookie settings, url='https://maihem.ai/#'
        StaticText Contact:
        link contact@maihem.ai, url='https://maihem.ai/#'
        link, url='https://www.linkedin.com/company/maihem/'
            image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec00124fed91f865aa45cf_Vectors-Wrapper.svg'
        link, url='https://x.com/maihem_ai'
            image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec001d0d29c296007d3acc_Vectors-Wrapper.svg'
        link, url='https://www.instagram.com/maihem.ai/'
            image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec00234d68bd95b7aef3c2_Vectors-Wrapper.svg'
        link, url='https://github.com/maihemlabs'
            image, url='https://cdn.prod.website-files.com/66e99075cefffd9f1dddd9f1/66ec0028f8cfe84a2af87407_Vectors-Wrapper.svg'
        StaticText © 2024 Maihem
        StaticText Design by
        StaticText Shoreditch Design
RootWebArea What is Maihem - Maihem, focused, url='https://docs.maihem.ai/get-started/what-is-maihem'
    main
        [68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
            image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
        [74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
            image
        navigation
            list
                listitem
                    [83] link Home, center=(1286,32), url='https://maihem.ai/'
                listitem
                    [85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
                listitem
                    [87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
                listitem
                    [91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
        [97] button Toggle dark mode, center=(1632,32)
            image
        heading Get started
        list
            listitem
                [125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
                    image
            listitem
                [130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
                    image
            listitem
                [135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
                    image
        heading How-to guides
        list
            listitem
                [143] link Test RAG and hallucinations, center=(384,322), url='https://docs.maihem.ai/guides/test/rag'
                    image
            listitem
                [148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
                    image
            listitem
                [153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
                    image
        heading Reference
        list
            listitem
                [161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
                    image
            listitem
                [166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
                    image
            listitem
                [171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
                    image
        StaticText Get started
        heading What is Maihem
        paragraph
            StaticText Maihem automatically tests your AI applications (target agents), from simple LLM applications to complex agentic workflows.
        [189] link Quickstart Set up Maihem and run your first test in 5 minutes, center=(996,321), inner_text=Quickstart

Set up Maihem and run your first test in 5 minutes, url='https://docs.maihem.ai/get-started/quickstart'
            image
            heading Quickstart
            paragraph
                StaticText Set up Maihem and run your first test in 5 minutes
        [198] link How-to guides Find examples to run custom tests, center=(787,427), inner_text=How-to guides

Find examples to run custom tests, url='https://docs.maihem.ai/guides'
            image
            heading How-to guides
            paragraph
                StaticText Find examples to run custom tests
        [206] link Reference Read our detailed documentation, center=(1204,427), inner_text=Reference

Read our detailed documentation, url='https://docs.maihem.ai/reference/core-concepts/summary'
            image
            heading Reference
            paragraph
                StaticText Read our detailed documentation
        heading Navigate to header Test modules and metrics
            [216] link Navigate to header, center=(567,544), inner_text=​, url='https://docs.maihem.ai/get-started/what-is-maihem#test-modules-and-metrics'
                StaticText ​
                image
        separator, orientation='horizontal'
        [219] span, center=(729,544), inner_text=Test modules and metrics
        paragraph
            StaticText Maihem enables you to test across many dimensions:
        list
            listitem
                [224] link Customer experience (CX), center=(725,688), url='https://docs.maihem.ai/get-started/what-is-maihem#customer-experience-cx'
            listitem
                [226] link Retrieval-augmented generation (RAG), center=(773,724), url='https://docs.maihem.ai/get-started/what-is-maihem#retrieval-augmented-generation-rag'
            listitem
                strong
                    StaticText Safety
                StaticText (
                [229] link Bias, center=(699,760), url='https://docs.maihem.ai/get-started/what-is-maihem#bias'
                StaticText ,
                [230] link Brand reputation, center=(794,760), url='https://docs.maihem.ai/get-started/what-is-maihem#brand-reputation'
                StaticText ,
                [231] link Toxicity, center=(902,760), url='https://docs.maihem.ai/get-started/what-is-maihem#toxicity'
                StaticText )
            listitem
                strong
                    StaticText Security
                StaticText (
                [234] link Overreach, center=(739,796), url='https://docs.maihem.ai/get-started/what-is-maihem#overreach'
                StaticText ,
                [235] link Privacy (PII), center=(839,796), url='https://docs.maihem.ai/get-started/what-is-maihem#privacy-pii'
                StaticText ,
                [236] link System access, center=(955,796), url='https://docs.maihem.ai/get-started/what-is-maihem#system-access'
                StaticText )
        heading Navigate to header How it works
            [239] link Navigate to header, center=(567,873), inner_text=​, url='https://docs.maihem.ai/get-started/what-is-maihem#how-it-works'
                StaticText ​
                image
        separator, orientation='horizontal'
        [242] span, center=(660,873), inner_text=How it works
        list
            listitem
                StaticText 1
                paragraph
                    StaticText Simulate conversations
                paragraph
                    StaticText Simulate conversations between your target Agent and our Maihem agents
            listitem
                StaticText 2
                paragraph
                    StaticText Evaluate
                paragraph
                    StaticText Evaluate your target agent and find failures
            listitem
                StaticText 3
                paragraph
                    StaticText Get action list
                paragraph
                    StaticText Get an action list to fix detected failures
            listitem
                StaticText 4
                paragraph
                    StaticText Improve
                paragraph
                    StaticText Iterate and improve 10x faster
        link Get Maihem SDK and API key, url='https://docs.maihem.ai/get-started/get-maihem'
            image
        link linkedin, url='https://www.linkedin.com/company/maihem'
            image
        link instagram, url='https://www.instagram.com/maihem.ai/'
            image
        link twitter, url='https://twitter.com/maihem_ai'
            image
        link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
    alert, atomic

Reasoning for action: I clicked on the 'Documentation' link as this is the starting point to access the information necessary for testing retrieval-augmented generation and customer experience. By navigating to the documentation section, I will be able to locate and then click on the specific links related to 'Test RAG and hallucinations' and 'Test customer experience.' This step is essential for fulfilling the user story and achieving the acceptance criteria.

Action: click(49) # Click on the Documentation link to explore the documentation section.

Change Description: After clicking the Documentation link, the browser navigated to the Maihem documentation page located at 'https://docs.maihem.ai/get-started/what-is-maihem', replacing the previous content. The new page features sections such as 'Get started', 'How-to guides', and 'Reference', along with links to resources and a search functionality, indicating a shift from the main site to a dedicated documentation environment.

Screenshot before:Screenshot after:---

Step 2:

Url (before/after):

https://docs.maihem.ai/get-started/what-is-maihem

https://docs.maihem.ai/guides/test/rag

Content (before/after):

RootWebArea What is Maihem - Maihem, focused, url='https://docs.maihem.ai/get-started/what-is-maihem'
    main
        [68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
            image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
        [74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
            image
        navigation
            list
                listitem
                    [83] link Home, center=(1286,32), url='https://maihem.ai/'
                listitem
                    [85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
                listitem
                    [87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
                listitem
                    [91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
        [97] button Toggle dark mode, center=(1632,32)
            image
        heading Get started
        list
            listitem
                [125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
                    image
            listitem
                [130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
                    image
            listitem
                [135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
                    image
        heading How-to guides
        list
            listitem
                [143] link Test RAG and hallucinations, center=(384,322), url='https://docs.maihem.ai/guides/test/rag'
                    image
            listitem
                [148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
                    image
            listitem
                [153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
                    image
        heading Reference
        list
            listitem
                [161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
                    image
            listitem
                [166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
                    image
            listitem
                [171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
                    image
        StaticText Get started
        heading What is Maihem
        paragraph
            StaticText Maihem automatically tests your AI applications (target agents), from simple LLM applications to complex agentic workflows.
        [189] link Quickstart Set up Maihem and run your first test in 5 minutes, center=(996,321), inner_text=Quickstart

Set up Maihem and run your first test in 5 minutes, url='https://docs.maihem.ai/get-started/quickstart'
            image
            heading Quickstart
            paragraph
                StaticText Set up Maihem and run your first test in 5 minutes
        [198] link How-to guides Find examples to run custom tests, center=(787,427), inner_text=How-to guides

Find examples to run custom tests, url='https://docs.maihem.ai/guides'
            image
            heading How-to guides
            paragraph
                StaticText Find examples to run custom tests
        [206] link Reference Read our detailed documentation, center=(1204,427), inner_text=Reference

Read our detailed documentation, url='https://docs.maihem.ai/reference/core-concepts/summary'
            image
            heading Reference
            paragraph
                StaticText Read our detailed documentation
        heading Navigate to header Test modules and metrics
            [216] link Navigate to header, center=(567,544), inner_text=​, url='https://docs.maihem.ai/get-started/what-is-maihem#test-modules-and-metrics'
                StaticText ​
                image
        separator, orientation='horizontal'
        [219] span, center=(729,544), inner_text=Test modules and metrics
        paragraph
            StaticText Maihem enables you to test across many dimensions:
        list
            listitem
                [224] link Customer experience (CX), center=(725,688), url='https://docs.maihem.ai/get-started/what-is-maihem#customer-experience-cx'
            listitem
                [226] link Retrieval-augmented generation (RAG), center=(773,724), url='https://docs.maihem.ai/get-started/what-is-maihem#retrieval-augmented-generation-rag'
            listitem
                strong
                    StaticText Safety
                StaticText (
                [229] link Bias, center=(699,760), url='https://docs.maihem.ai/get-started/what-is-maihem#bias'
                StaticText ,
                [230] link Brand reputation, center=(794,760), url='https://docs.maihem.ai/get-started/what-is-maihem#brand-reputation'
                StaticText ,
                [231] link Toxicity, center=(902,760), url='https://docs.maihem.ai/get-started/what-is-maihem#toxicity'
                StaticText )
            listitem
                strong
                    StaticText Security
                StaticText (
                [234] link Overreach, center=(739,796), url='https://docs.maihem.ai/get-started/what-is-maihem#overreach'
                StaticText ,
                [235] link Privacy (PII), center=(839,796), url='https://docs.maihem.ai/get-started/what-is-maihem#privacy-pii'
                StaticText ,
                [236] link System access, center=(955,796), url='https://docs.maihem.ai/get-started/what-is-maihem#system-access'
                StaticText )
        heading Navigate to header How it works
            [239] link Navigate to header, center=(567,873), inner_text=​, url='https://docs.maihem.ai/get-started/what-is-maihem#how-it-works'
                StaticText ​
                image
        separator, orientation='horizontal'
        [242] span, center=(660,873), inner_text=How it works
        list
            listitem
                StaticText 1
                paragraph
                    StaticText Simulate conversations
                paragraph
                    StaticText Simulate conversations between your target Agent and our Maihem agents
            listitem
                StaticText 2
                paragraph
                    StaticText Evaluate
                paragraph
                    StaticText Evaluate your target agent and find failures
            listitem
                StaticText 3
                paragraph
                    StaticText Get action list
                paragraph
                    StaticText Get an action list to fix detected failures
            listitem
                StaticText 4
                paragraph
                    StaticText Improve
                paragraph
                    StaticText Iterate and improve 10x faster
        link Get Maihem SDK and API key, url='https://docs.maihem.ai/get-started/get-maihem'
            image
        link linkedin, url='https://www.linkedin.com/company/maihem'
            image
        link instagram, url='https://www.instagram.com/maihem.ai/'
            image
        link twitter, url='https://twitter.com/maihem_ai'
            image
        link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
    alert, atomic
RootWebArea Test RAG and hallucinations - Maihem, focused, url='https://docs.maihem.ai/guides/test/rag'
    main
        [68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
            image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
        [74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
            image
        navigation
            list
                listitem
                    [83] link Home, center=(1286,32), url='https://maihem.ai/'
                listitem
                    [85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
                listitem
                    [87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
                listitem
                    [91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
        [97] button Toggle dark mode, center=(1632,32)
            image
        heading Get started
        list
            listitem
                [125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
                    image
            listitem
                [130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
                    image
            listitem
                [135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
                    image
        heading How-to guides
        list
            listitem
                [143] link Test RAG and hallucinations, center=(384,322), focused, url='https://docs.maihem.ai/guides/test/rag'
                    image
            listitem
                [148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
                    image
            listitem
                [153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
                    image
        heading Reference
        list
            listitem
                [161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
                    image
            listitem
                [166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
                    image
            listitem
                [171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
                    image
        StaticText How-to guides
        heading Test RAG and hallucinations
        paragraph
            StaticText Generate questions from your documents to test your RAG application for hallucinations, answer relevance, and context retrieval efficiency
        image Info
        paragraph
            [339] link Get your Maihem API key and install the SDK, center=(794,295), url='https://docs.maihem.ai/get-started/get-maihem'
            StaticText before you start.
        list
            listitem
                [342] button Python, center=(612,364)
            listitem
                [344] button Command line interface (CLI), center=(764,364)
        list
            listitem
                StaticText 1
                heading Create target agent (if you haven't already)
                [356] button Copy the contents from the code block, center=(1373,516)
                    image
                code
                    StaticText from
                    StaticText maihem
                    StaticText import
                    StaticText Maihem
                    StaticText maihem_client
                    StaticText =
                    StaticText Maihem
                    StaticText (
                    StaticText )
                    StaticText maihem_client
                    StaticText .
                    StaticText create_target_agent
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "rag_financial_assistant"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "POC stock market agent"
                    StaticText ,
                    StaticText # Optional
                    StaticText role
                    StaticText =
                    StaticText "AI Financial Assistant"
                    StaticText ,
                    StaticText description
                    StaticText =
                    StaticText "An AI assistant that provides information and summaries from financial documents."
                    StaticText language
                    StaticText =
                    StaticText "en"
                    StaticText # Optional, default is "en" (English), follow ISO 639
                    StaticText )
            listitem
                StaticText 2
                heading Add documents to generate questions from them
                paragraph
                    StaticText Maihem supports documents in the following formats: pdf, txt, docx, md.
                paragraph
                    StaticText Move all the documents to the same folder.
                [406] button Copy the contents from the code block, center=(1373,1012)
                    image
                code
                    StaticText documents_path
                    StaticText =
                    StaticText "/path/to/folder/with/documents"
            listitem
                StaticText 3
                heading Create a test
                paragraph
                    StaticText Create a RAG test by using the
                    code
                        StaticText RAG module
                    StaticText . A module defines the scope of a test.
                paragraph
                    StaticText A RAG test will generate a random set of questions from your documents to test your target agent, and evaluate these metrics:
                list
                    listitem
                        StaticText Answer relevance
                    listitem
                        StaticText Context retrieval efficiency
                    listitem
                        StaticText Hallucinations
                paragraph
                    link See more detailed documentation on metrics and modules, url='https://docs.maihem.ai/reference/metric-collection'
                    StaticText .
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText maihem
                    StaticText import
                    StaticText Maihem
                    StaticText maihem_client
                    StaticText =
                    StaticText Maihem
                    StaticText (
                    StaticText )
                    StaticText maihem_client
                    StaticText .
                    StaticText create_test
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "rag_test_1"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "RAG test #1"
                    StaticText ,
                    StaticText # Optional
                    StaticText target_agent_name
                    StaticText =
                    StaticText "rag_financial_assistant"
                    StaticText ,
                    StaticText initiating_agent
                    StaticText =
                    StaticText "maihem"
                    StaticText # or "target"
                    StaticText modules
                    StaticText =
                    StaticText [
                    StaticText "rag"
                    StaticText ]
                    StaticText ,
                    StaticText documents_path
                    StaticText =
                    StaticText documents_path
                    StaticText # Path to folder with documents
                    StaticText number_conversations
                    StaticText =
                    StaticText 50
                    StaticText ,
                    StaticText conversation_turns_max
                    StaticText =
                    StaticText 5
                    StaticText # Optional, default is 10
                    StaticText )
            listitem
                StaticText 4
                heading Connect target agent with wrapper function
                paragraph
                    StaticText Modify the following function to wrap your target agent:
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText typing
                    StaticText import
                    StaticText Tuple
                    StaticText ,
                    StaticText List
                    StaticText ,
                    StaticText Dict
                    StaticText def
                    StaticText wrapper_function
                    StaticText (
                    StaticText conversation_id
                    StaticText :
                    StaticText str
                    StaticText ,
                    StaticText # Keep track of different conversations with conversation_id
                    StaticText maihem_agent_message
                    StaticText :
                    StaticText str
                    StaticText ,
                    StaticText # The message from Maihem
                    StaticText conversation_history
                    StaticText :
                    StaticText Dict
                    StaticText # Auxiliary dictionary to store conversation history (if needed)
                    StaticText )
                    StaticText -
                    StaticText >
                    StaticText Tuple
                    StaticText [
                    StaticText str
                    StaticText ,
                    StaticText List
                    StaticText [
                    StaticText str
                    StaticText ]
                    StaticText ]
                    StaticText :
                    StaticText """Callable wrapper function to wrap your target agent to be tested."""
                    StaticText # Replace with the message from your target agent
                    StaticText target_agent_message
                    StaticText =
                    StaticText "Hi, how can I help you?"
                    StaticText # If target initiates conversation, first maihem_agent_message is None
                    StaticText # (Optional) add messages to conversation history
                    StaticText conversation_history
                    StaticText [
                    StaticText conversation_id
                    StaticText ]
                    StaticText .
                    StaticText append
                    StaticText (
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "maihem"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText maihem_agent_message
                    StaticText }
                    StaticText )
                    StaticText conversation_history
                    StaticText [
                    StaticText conversation_id
                    StaticText ]
                    StaticText .
                    StaticText append
                    StaticText (
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "target"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText target_agent_message
                    StaticText }
                    StaticText )
                    StaticText # List of retrieved contexts for RAG evaluations
                    StaticText contexts
                    StaticText =
                    StaticText [
                    StaticText "Context_1"
                    StaticText ,
                    StaticText "Context_2"
                    StaticText ]
                    StaticText return
                    StaticText target_agent_message
                    StaticText ,
                    StaticText contexts
            listitem
                StaticText 5
                heading Run the test
                paragraph
                    StaticText A
                    code
                        StaticText test run
                    StaticText will generate:
                list
                    listitem
                        StaticText Simulated conversations between your target agent and Maihem
                    listitem
                        StaticText Evaluations of the conversations
                    listitem
                        StaticText A list of detected failures
                button Copy the contents from the code block
                    image
                code
                    StaticText maihem_client
                    StaticText .
                    StaticText run_test
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "modelX_prompt2.5_28-11-2024"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "Model X Prompt v2.5 (28/Nov/2024)"
                    StaticText ,
                    StaticText # Optional
                    StaticText test_name
                    StaticText =
                    StaticText "rag_test_1"
                    StaticText ,
                    StaticText wrapper_function
                    StaticText =
                    StaticText wrapper_function
                    StaticText ,
                    StaticText # your wrapper function
                    StaticText concurrent_conversations
                    StaticText =
                    StaticText 10
                    StaticText # Optional
                    StaticText )
            listitem
                StaticText 6
                heading See test run results
                paragraph
                    link See the results in your Maihem account, url='https://cause.maihem.ai/'
                    StaticText .
                paragraph
                    StaticText Or get the test results:
                button Copy the contents from the code block
                    image
                code
                    StaticText test_run_results
                    StaticText =
                    StaticText maihem_client
                    StaticText .
                    StaticText get_test_run_results
                    StaticText (
                    StaticText test_name
                    StaticText =
                    StaticText "rag_test_1"
                    StaticText ,
                    StaticText test_run_name
                    StaticText =
                    StaticText "modelX_prompt2.5_28-11-2024"
                    StaticText )
                paragraph
                    code
                        StaticText test_run_results
                    StaticText contains the following information:
                button Copy the contents from the code block
                    image
                code
                    StaticText test_run_results
                    StaticText .
                    StaticText result
                    StaticText =
                    StaticText "failed"
                    StaticText test_run_results
                    StaticText .
                    StaticText score
                    StaticText =
                    StaticText 82.5
                    StaticText test_run_results
                    StaticText .
                    StaticText conversations
                    StaticText [
                    StaticText 0
                    StaticText ]
                    StaticText .
                    StaticText messages
                    StaticText =
                    StaticText [
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "maihem"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText "When was Fund X created?"
                    StaticText }
                    StaticText ,
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "target"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText "Sorry, I could not find this information."
                    StaticText ,
                    StaticText "evaluation"
                    StaticText :
                    StaticText {
                    StaticText "is_failure"
                    StaticText :
                    StaticText True
                    StaticText ,
                    StaticText "explanation"
                    StaticText :
                    StaticText "Hallucination detected. Fund X was created in 2005."
                    StaticText }
                    StaticText }
                    StaticText ]
                    StaticText test_run_failures
                    StaticText =
                    StaticText [
                    StaticText ]
        link Quickstart, url='https://docs.maihem.ai/get-started/quickstart'
            image
        link Test customer experience, url='https://docs.maihem.ai/guides/test/cx'
            image
        link linkedin, url='https://www.linkedin.com/company/maihem'
            image
        link instagram, url='https://www.instagram.com/maihem.ai/'
            image
        link twitter, url='https://twitter.com/maihem_ai'
            image
        link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
    alert, atomic
        StaticText Test RAG and hallucinations - Maihem

Reasoning for action: I selected the 'Test RAG and hallucinations' link because it is directly related to the user's goal of exploring instructions for testing retrieval-augmented generation. By clicking this link, I will access the specific content needed to understand how to implement and improve performance in AI applications, thereby progressing towards the objective.

Action: click(143) # Click on the 'Test RAG and hallucinations' link to explore how to test retrieval-augmented generation.

Change Description: The browser navigated to the 'Test RAG and hallucinations' page, updating the main content to provide detailed instructions on generating questions for testing retrieval-augmented generation applications, including steps for creating target agents and running tests.

Screenshot before:Screenshot after:---

Step 3:

Url (before/after):

https://docs.maihem.ai/guides/test/rag

https://docs.maihem.ai/guides/test/cx

Content (before/after):

RootWebArea Test RAG and hallucinations - Maihem, focused, url='https://docs.maihem.ai/guides/test/rag'
    main
        [68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
            image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
        [74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
            image
        navigation
            list
                listitem
                    [83] link Home, center=(1286,32), url='https://maihem.ai/'
                listitem
                    [85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
                listitem
                    [87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
                listitem
                    [91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
        [97] button Toggle dark mode, center=(1632,32)
            image
        heading Get started
        list
            listitem
                [125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
                    image
            listitem
                [130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
                    image
            listitem
                [135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
                    image
        heading How-to guides
        list
            listitem
                [143] link Test RAG and hallucinations, center=(384,322), focused, url='https://docs.maihem.ai/guides/test/rag'
                    image
            listitem
                [148] link Test customer experience, center=(384,358), url='https://docs.maihem.ai/guides/test/cx'
                    image
            listitem
                [153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
                    image
        heading Reference
        list
            listitem
                [161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
                    image
            listitem
                [166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
                    image
            listitem
                [171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
                    image
        StaticText How-to guides
        heading Test RAG and hallucinations
        paragraph
            StaticText Generate questions from your documents to test your RAG application for hallucinations, answer relevance, and context retrieval efficiency
        image Info
        paragraph
            [339] link Get your Maihem API key and install the SDK, center=(794,295), url='https://docs.maihem.ai/get-started/get-maihem'
            StaticText before you start.
        list
            listitem
                [342] button Python, center=(612,364)
            listitem
                [344] button Command line interface (CLI), center=(764,364)
        list
            listitem
                StaticText 1
                heading Create target agent (if you haven't already)
                [356] button Copy the contents from the code block, center=(1373,516)
                    image
                code
                    StaticText from
                    StaticText maihem
                    StaticText import
                    StaticText Maihem
                    StaticText maihem_client
                    StaticText =
                    StaticText Maihem
                    StaticText (
                    StaticText )
                    StaticText maihem_client
                    StaticText .
                    StaticText create_target_agent
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "rag_financial_assistant"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "POC stock market agent"
                    StaticText ,
                    StaticText # Optional
                    StaticText role
                    StaticText =
                    StaticText "AI Financial Assistant"
                    StaticText ,
                    StaticText description
                    StaticText =
                    StaticText "An AI assistant that provides information and summaries from financial documents."
                    StaticText language
                    StaticText =
                    StaticText "en"
                    StaticText # Optional, default is "en" (English), follow ISO 639
                    StaticText )
            listitem
                StaticText 2
                heading Add documents to generate questions from them
                paragraph
                    StaticText Maihem supports documents in the following formats: pdf, txt, docx, md.
                paragraph
                    StaticText Move all the documents to the same folder.
                [406] button Copy the contents from the code block, center=(1373,1012)
                    image
                code
                    StaticText documents_path
                    StaticText =
                    StaticText "/path/to/folder/with/documents"
            listitem
                StaticText 3
                heading Create a test
                paragraph
                    StaticText Create a RAG test by using the
                    code
                        StaticText RAG module
                    StaticText . A module defines the scope of a test.
                paragraph
                    StaticText A RAG test will generate a random set of questions from your documents to test your target agent, and evaluate these metrics:
                list
                    listitem
                        StaticText Answer relevance
                    listitem
                        StaticText Context retrieval efficiency
                    listitem
                        StaticText Hallucinations
                paragraph
                    link See more detailed documentation on metrics and modules, url='https://docs.maihem.ai/reference/metric-collection'
                    StaticText .
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText maihem
                    StaticText import
                    StaticText Maihem
                    StaticText maihem_client
                    StaticText =
                    StaticText Maihem
                    StaticText (
                    StaticText )
                    StaticText maihem_client
                    StaticText .
                    StaticText create_test
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "rag_test_1"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "RAG test #1"
                    StaticText ,
                    StaticText # Optional
                    StaticText target_agent_name
                    StaticText =
                    StaticText "rag_financial_assistant"
                    StaticText ,
                    StaticText initiating_agent
                    StaticText =
                    StaticText "maihem"
                    StaticText # or "target"
                    StaticText modules
                    StaticText =
                    StaticText [
                    StaticText "rag"
                    StaticText ]
                    StaticText ,
                    StaticText documents_path
                    StaticText =
                    StaticText documents_path
                    StaticText # Path to folder with documents
                    StaticText number_conversations
                    StaticText =
                    StaticText 50
                    StaticText ,
                    StaticText conversation_turns_max
                    StaticText =
                    StaticText 5
                    StaticText # Optional, default is 10
                    StaticText )
            listitem
                StaticText 4
                heading Connect target agent with wrapper function
                paragraph
                    StaticText Modify the following function to wrap your target agent:
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText typing
                    StaticText import
                    StaticText Tuple
                    StaticText ,
                    StaticText List
                    StaticText ,
                    StaticText Dict
                    StaticText def
                    StaticText wrapper_function
                    StaticText (
                    StaticText conversation_id
                    StaticText :
                    StaticText str
                    StaticText ,
                    StaticText # Keep track of different conversations with conversation_id
                    StaticText maihem_agent_message
                    StaticText :
                    StaticText str
                    StaticText ,
                    StaticText # The message from Maihem
                    StaticText conversation_history
                    StaticText :
                    StaticText Dict
                    StaticText # Auxiliary dictionary to store conversation history (if needed)
                    StaticText )
                    StaticText -
                    StaticText >
                    StaticText Tuple
                    StaticText [
                    StaticText str
                    StaticText ,
                    StaticText List
                    StaticText [
                    StaticText str
                    StaticText ]
                    StaticText ]
                    StaticText :
                    StaticText """Callable wrapper function to wrap your target agent to be tested."""
                    StaticText # Replace with the message from your target agent
                    StaticText target_agent_message
                    StaticText =
                    StaticText "Hi, how can I help you?"
                    StaticText # If target initiates conversation, first maihem_agent_message is None
                    StaticText # (Optional) add messages to conversation history
                    StaticText conversation_history
                    StaticText [
                    StaticText conversation_id
                    StaticText ]
                    StaticText .
                    StaticText append
                    StaticText (
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "maihem"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText maihem_agent_message
                    StaticText }
                    StaticText )
                    StaticText conversation_history
                    StaticText [
                    StaticText conversation_id
                    StaticText ]
                    StaticText .
                    StaticText append
                    StaticText (
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "target"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText target_agent_message
                    StaticText }
                    StaticText )
                    StaticText # List of retrieved contexts for RAG evaluations
                    StaticText contexts
                    StaticText =
                    StaticText [
                    StaticText "Context_1"
                    StaticText ,
                    StaticText "Context_2"
                    StaticText ]
                    StaticText return
                    StaticText target_agent_message
                    StaticText ,
                    StaticText contexts
            listitem
                StaticText 5
                heading Run the test
                paragraph
                    StaticText A
                    code
                        StaticText test run
                    StaticText will generate:
                list
                    listitem
                        StaticText Simulated conversations between your target agent and Maihem
                    listitem
                        StaticText Evaluations of the conversations
                    listitem
                        StaticText A list of detected failures
                button Copy the contents from the code block
                    image
                code
                    StaticText maihem_client
                    StaticText .
                    StaticText run_test
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "modelX_prompt2.5_28-11-2024"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "Model X Prompt v2.5 (28/Nov/2024)"
                    StaticText ,
                    StaticText # Optional
                    StaticText test_name
                    StaticText =
                    StaticText "rag_test_1"
                    StaticText ,
                    StaticText wrapper_function
                    StaticText =
                    StaticText wrapper_function
                    StaticText ,
                    StaticText # your wrapper function
                    StaticText concurrent_conversations
                    StaticText =
                    StaticText 10
                    StaticText # Optional
                    StaticText )
            listitem
                StaticText 6
                heading See test run results
                paragraph
                    link See the results in your Maihem account, url='https://cause.maihem.ai/'
                    StaticText .
                paragraph
                    StaticText Or get the test results:
                button Copy the contents from the code block
                    image
                code
                    StaticText test_run_results
                    StaticText =
                    StaticText maihem_client
                    StaticText .
                    StaticText get_test_run_results
                    StaticText (
                    StaticText test_name
                    StaticText =
                    StaticText "rag_test_1"
                    StaticText ,
                    StaticText test_run_name
                    StaticText =
                    StaticText "modelX_prompt2.5_28-11-2024"
                    StaticText )
                paragraph
                    code
                        StaticText test_run_results
                    StaticText contains the following information:
                button Copy the contents from the code block
                    image
                code
                    StaticText test_run_results
                    StaticText .
                    StaticText result
                    StaticText =
                    StaticText "failed"
                    StaticText test_run_results
                    StaticText .
                    StaticText score
                    StaticText =
                    StaticText 82.5
                    StaticText test_run_results
                    StaticText .
                    StaticText conversations
                    StaticText [
                    StaticText 0
                    StaticText ]
                    StaticText .
                    StaticText messages
                    StaticText =
                    StaticText [
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "maihem"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText "When was Fund X created?"
                    StaticText }
                    StaticText ,
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "target"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText "Sorry, I could not find this information."
                    StaticText ,
                    StaticText "evaluation"
                    StaticText :
                    StaticText {
                    StaticText "is_failure"
                    StaticText :
                    StaticText True
                    StaticText ,
                    StaticText "explanation"
                    StaticText :
                    StaticText "Hallucination detected. Fund X was created in 2005."
                    StaticText }
                    StaticText }
                    StaticText ]
                    StaticText test_run_failures
                    StaticText =
                    StaticText [
                    StaticText ]
        link Quickstart, url='https://docs.maihem.ai/get-started/quickstart'
            image
        link Test customer experience, url='https://docs.maihem.ai/guides/test/cx'
            image
        link linkedin, url='https://www.linkedin.com/company/maihem'
            image
        link instagram, url='https://www.instagram.com/maihem.ai/'
            image
        link twitter, url='https://twitter.com/maihem_ai'
            image
        link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
    alert, atomic
        StaticText Test RAG and hallucinations - Maihem
RootWebArea Test customer experience - Maihem, focused, url='https://docs.maihem.ai/guides/test/cx'
    main
        [68] link Maihem home page light logo, center=(336,32), inner_text=Maihem home page, url='https://docs.maihem.ai/'
            image light logo, url='https://mintlify.s3.us-west-1.amazonaws.com/maihem-29/logo/MAIHEM_Bittersweet_Full_Logo.svg'
        [74] button Search or ask... Ctrl K, center=(960,32), inner_text=Search or ask...
Show more
Ctrl K, type=button
            image
        navigation
            list
                listitem
                    [83] link Home, center=(1286,32), url='https://maihem.ai/'
                listitem
                    [85] link Support, center=(1358,32), url='mailto:support@maihem.ai'
                listitem
                    [87] link GitHub, center=(1433,32), url='https://github.com/maihemlabs/maihem'
                listitem
                    [91] link Get Maihem, center=(1540,32), url='https://cal.com/team/maihem/intro-to-maihem'
        [97] button Toggle dark mode, center=(1632,32)
            image
        heading Get started
        list
            listitem
                [125] link What is Maihem, center=(384,148), url='https://docs.maihem.ai/get-started/what-is-maihem'
                    image
            listitem
                [130] link Get Maihem SDK and API key, center=(384,184), url='https://docs.maihem.ai/get-started/get-maihem'
                    image
            listitem
                [135] link Quickstart, center=(384,220), url='https://docs.maihem.ai/get-started/quickstart'
                    image
        heading How-to guides
        list
            listitem
                [143] link Test RAG and hallucinations, center=(384,322), url='https://docs.maihem.ai/guides/test/rag'
                    image
            listitem
                [148] link Test customer experience, center=(384,358), focused, url='https://docs.maihem.ai/guides/test/cx'
                    image
            listitem
                [153] link Monitor in production, center=(384,394), url='https://docs.maihem.ai/guides/monitor/monitor'
                    image
        heading Reference
        list
            listitem
                [161] link Core concepts, center=(384,496), url='https://docs.maihem.ai/reference/core-concepts/summary'
                    image
            listitem
                [166] link Metric and module collection, center=(384,532), url='https://docs.maihem.ai/reference/metric-collection'
                    image
            listitem
                [171] link Connect your AI application, center=(384,568), url='https://docs.maihem.ai/reference/connect-agent'
                    image
        StaticText How-to guides
        heading Test customer experience
        paragraph
            StaticText Simulate a population of personas to test your AI application
        image Info
        paragraph
            [809] link Get your Maihem API key and install the SDK, center=(794,267), url='https://docs.maihem.ai/get-started/get-maihem'
            StaticText before you start.
        list
            listitem
                [812] button Python, center=(612,336)
            listitem
                [814] button Command line interface (CLI), center=(764,336)
        list
            listitem
                StaticText 1
                heading Create target agent (if you haven't already)
                [826] button Copy the contents from the code block, center=(1373,488)
                    image
                code
                    StaticText from
                    StaticText maihem
                    StaticText import
                    StaticText Maihem
                    StaticText maihem_client
                    StaticText =
                    StaticText Maihem
                    StaticText (
                    StaticText )
                    StaticText maihem_client
                    StaticText .
                    StaticText create_target_agent
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "history_tutor"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "AI History Tutor"
                    StaticText ,
                    StaticText # Optional
                    StaticText role
                    StaticText =
                    StaticText "AI tutor"
                    StaticText ,
                    StaticText description
                    StaticText =
                    StaticText "A history tutor that helps students prepare for exams with practice questions and summaries"
                    StaticText ,
                    StaticText language
                    StaticText =
                    StaticText "en"
                    StaticText # Optional, default is "en" (English), follow ISO 639
                    StaticText )
            listitem
                StaticText 2
                heading Create a test with 'cx' module
                paragraph
                    StaticText Create a test using the
                    code
                        StaticText Customer Experience (CX) module
                    StaticText . A module defines the scope of a test.
                paragraph
                    StaticText A test will simulate conversations with personas to test the customer experience of your target agent using these metrics:
                list
                    [877] listitem, center=(1026,994), inner_text=Helpfulness
                        StaticText Helpfulness
                    [878] listitem, center=(1026,1030), inner_text=Goal completion
                        StaticText Goal completion
                    [879] listitem, center=(1026,1066), inner_text=Retention
                        StaticText Retention
                    listitem
                        StaticText Net promoter score (NPS)
                paragraph
                    StaticText Guide the simulated conversations using prompts:
                list
                    listitem
                        code
                            StaticText maihem_behavior_prompt
                        StaticText guides the behavior of the simulated personas
                    listitem
                        code
                            StaticText maihem_goal_prompt
                        StaticText describes the goal of the simulated personas
                    listitem
                        code
                            StaticText maihem_population_prompt
                        StaticText describes the desired population of simulated personas
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText maihem
                    StaticText import
                    StaticText Maihem
                    StaticText maihem_client
                    StaticText =
                    StaticText Maihem
                    StaticText (
                    StaticText )
                    StaticText maihem_client
                    StaticText .
                    StaticText create_test
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "cx_test_1"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "Customer experience Test #1"
                    StaticText ,
                    StaticText # Optional
                    StaticText target_agent_name
                    StaticText =
                    StaticText "history_tutor"
                    StaticText ,
                    StaticText initiating_agent
                    StaticText =
                    StaticText "maihem"
                    StaticText ,
                    StaticText # or "target"
                    StaticText modules
                    StaticText =
                    StaticText [
                    StaticText "cx"
                    StaticText ]
                    StaticText ,
                    StaticText maihem_behavior_prompt
                    StaticText =
                    StaticText "Request quizz-style questions and deep dives in bullet points for identified gaps"
                    StaticText ,
                    StaticText maihem_goal_prompt
                    StaticText =
                    StaticText "Prepare for exam tomorrow on the Industrial Revolution"
                    StaticText ,
                    StaticText maihem_population_prompt
                    StaticText =
                    StaticText "High school students, some care about school and want to ace the exam, some don't care and just want to pass"
                    StaticText ,
                    StaticText number_conversations
                    StaticText =
                    StaticText 20
                    StaticText ,
                    StaticText conversation_turns_max
                    StaticText =
                    StaticText 5
                    StaticText ,
                    StaticText )
            listitem
                StaticText 3
                heading Connect target agent with wrapper function
                paragraph
                    StaticText Modify the following function to wrap your target agent:
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText typing
                    StaticText import
                    StaticText Tuple
                    StaticText ,
                    StaticText List
                    StaticText ,
                    StaticText Dict
                    StaticText def
                    StaticText wrapper_function
                    StaticText (
                    StaticText conversation_id
                    StaticText :
                    StaticText str
                    StaticText ,
                    StaticText # Keep track of different conversations with conversation_id
                    StaticText maihem_agent_message
                    StaticText :
                    StaticText str
                    StaticText ,
                    StaticText # The message from Maihem
                    StaticText conversation_history
                    StaticText :
                    StaticText Dict
                    StaticText # Auxiliary dictionary to store conversation history (if needed)
                    StaticText )
                    StaticText -
                    StaticText >
                    StaticText Tuple
                    StaticText [
                    StaticText str
                    StaticText ,
                    StaticText List
                    StaticText [
                    StaticText str
                    StaticText ]
                    StaticText ]
                    StaticText :
                    StaticText """Callable wrapper function to wrap your target agent to be tested."""
                    StaticText # Replace with the message from your target agent
                    StaticText target_agent_message
                    StaticText =
                    StaticText "Hi, how can I help you?"
                    StaticText # If target initiates conversation, first maihem_agent_message is None
                    StaticText # (Optional) add messages to conversation history
                    StaticText conversation_history
                    StaticText [
                    StaticText conversation_id
                    StaticText ]
                    StaticText .
                    StaticText append
                    StaticText (
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "maihem"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText maihem_agent_message
                    StaticText }
                    StaticText )
                    StaticText conversation_history
                    StaticText [
                    StaticText conversation_id
                    StaticText ]
                    StaticText .
                    StaticText append
                    StaticText (
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "target"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText target_agent_message
                    StaticText }
                    StaticText )
                    StaticText # List of retrieved contexts for RAG evaluations
                    StaticText contexts
                    StaticText =
                    StaticText [
                    StaticText "Context_1"
                    StaticText ,
                    StaticText "Context_2"
                    StaticText ]
                    StaticText return
                    StaticText target_agent_message
                    StaticText ,
                    StaticText contexts
            listitem
                StaticText 4
                heading Run the test
                paragraph
                    StaticText A
                    code
                        StaticText test run
                    StaticText will generate:
                list
                    listitem
                        StaticText Simulated conversations between your target agent and Maihem
                    listitem
                        StaticText Evaluations of the conversations
                    listitem
                        StaticText A list of detected failures
                button Copy the contents from the code block
                    image
                code
                    StaticText from
                    StaticText wrapper_function
                    StaticText import
                    StaticText wrapper_function
                    StaticText test_run_result
                    StaticText =
                    StaticText maihem_client
                    StaticText .
                    StaticText run_test
                    StaticText (
                    StaticText name
                    StaticText =
                    StaticText "modelX_prompt2.5_27-11-2024"
                    StaticText ,
                    StaticText label
                    StaticText =
                    StaticText "Model X Prompt v2.5 (27/Nov/2024)"
                    StaticText ,
                    StaticText # Optional
                    StaticText test_name
                    StaticText =
                    StaticText "cx_test_1"
                    StaticText ,
                    StaticText wrapper_function
                    StaticText =
                    StaticText wrapper_function
                    StaticText ,
                    StaticText concurrent_conversations
                    StaticText =
                    StaticText 10
                    StaticText # Optional
                    StaticText )
            listitem
                StaticText 5
                heading See test run results
                paragraph
                    link See the results in your Maihem account, url='https://cause.maihem.ai/'
                    StaticText .
                paragraph
                    StaticText Or get the test results:
                button Copy the contents from the code block
                    image
                code
                    StaticText test_run_result
                    StaticText =
                    StaticText maihem_client
                    StaticText .
                    StaticText get_test_run_result
                    StaticText (
                    StaticText test_name
                    StaticText =
                    StaticText "cx_test_1"
                    StaticText ,
                    StaticText test_run_name
                    StaticText =
                    StaticText "modelX_prompt2.5_27-11-2024"
                    StaticText )
                    StaticText print
                    StaticText (
                    StaticText test_run_result
                    StaticText )
                paragraph
                    code
                        StaticText test_run_results
                    StaticText contains the following information:
                button Copy the contents from the code block
                    image
                code
                    StaticText test_run_results
                    StaticText .
                    StaticText result
                    StaticText =
                    StaticText "failed"
                    StaticText test_run_results
                    StaticText .
                    StaticText score
                    StaticText =
                    StaticText 82.5
                    StaticText test_run_results
                    StaticText .
                    StaticText conversations
                    StaticText [
                    StaticText 0
                    StaticText ]
                    StaticText .
                    StaticText messages
                    StaticText =
                    StaticText [
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "maihem"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText "Do you think I'm well prepared for my test tomorrow?"
                    StaticText }
                    StaticText ,
                    StaticText {
                    StaticText "role"
                    StaticText :
                    StaticText "target"
                    StaticText ,
                    StaticText "content"
                    StaticText :
                    StaticText "I can't answer that, I don't have that information"
                    StaticText ,
                    StaticText "evaluation"
                    StaticText :
                    StaticText {
                    StaticText "is_failure"
                    StaticText :
                    StaticText True
                    StaticText ,
                    StaticText "explanation"
                    StaticText :
                    StaticText "Goal not completed. The persona want to what are the topics she needs to still review for the test."
                    StaticText }
                    StaticText }
                    StaticText ]
                    StaticText test_run_failures
                    StaticText =
                    StaticText [
                    StaticText ]
        link Test RAG and hallucinations, url='https://docs.maihem.ai/guides/test/rag'
            image
        link Monitor in production, url='https://docs.maihem.ai/guides/monitor/monitor'
            image
        link linkedin, url='https://www.linkedin.com/company/maihem'
            image
        link instagram, url='https://www.instagram.com/maihem.ai/'
            image
        link twitter, url='https://twitter.com/maihem_ai'
            image
        link Powered by Mintlify, url='https://mintlify.com/preview-request?utm_campaign=poweredBy&utm_medium=docs&utm_source=docs.maihem.ai'
    alert, atomic
        StaticText Test customer experience - Maihem

Reasoning for action: My reasoning for this action was to directly access the relevant guide for testing customer interactions, fulfilling part of the user's requirement to explore documentation on improving customer experience. By clicking on this link, I move closer to obtaining the detailed instructions needed to enhance performance and reliability in AI applications for customer interactions.

Action: click(148) # Click on the 'Test customer experience' link to explore how to test customer interactions.

Change Description: The browser changed from displaying the 'Test RAG and hallucinations' guide to the 'Test customer experience' guide upon clicking the respective link, updating the focused URL to 'https://docs.maihem.ai/guides/test/cx' and updating the visible content accordingly.

Screenshot before:Screenshot after:---