Task
User Engagement with the Genie AI Introduction
As a potential user of the Cosine Genie software, I explore the capabilities and details of the Genie AI model, so that I can understand its performance benchmark and how it can help improve my software engineering tasks.
Success definition: Given I am on the Cosine homepage at httpscosine.sh When I click on the Genie link and then the YouTube video link Then I should see the Genie introduction video embedded on the page, and when I click on the SWE-Bench link, I should be redirected to the arXiv page for the SWE-bench paper's details.
Trajectory
Step 1:
Url (before/after):
https://cosine.sh/
https://cosine.sh/genie
Content (before/after):
RootWebArea Cosine AI - Human Reasoning Lab, focused, url='https://cosine.sh/'
navigation
[40] link Home, center=(702,56), inner_text=COSINE, url='https://cosine.sh/'
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
Show more
[45] link Genie, center=(840,56), url='https://cosine.sh/genie'
[46] link Blog, center=(926,56), url='https://cosine.sh/blog'
[47] link About Us, center=(1026,56), url='https://cosine.sh/about'
[49] link Join Waitlist, center=(1198,56), url='https://cosine.sh/register'
[56] link ✨ Introducing Genie, an AI software engineering model, center=(960,128), url='https://cosine.sh/genie'
Canvas
Iframe YouTube video player
RootWebArea Introducing Genie - YouTube, url='https://www.youtube.com/embed/7XrTkhaI4zs?si=5ANRfSXbmbS31Lzp&vq=hd1080&enablejsapi=1&origin=https://cosine.sh'
[a31] link Photo image of Cosine, center=(264,1384)
[a39] link Introducing Genie, center=(951,1385), url='https://www.youtube.com/watch?v=7XrTkhaI4zs'
[a56] button Share, center=(1642,1382), hasPopup='menu'
image
[a92] button Play, center=(960,1762)
image
generic, atomic
[a320] link Watch on YouTube, center=(318,2142), url='https://www.youtube.com/watch?v=7XrTkhaI4zs&embeds_referring_euri=https%3A%2F%2Fcosine.sh%2F&embeds_referring_origin=https%3A%2F%2Fcosine.sh'
StaticText State of the art
heading Genie has the highest score in the world on SWE-Bench
paragraph
StaticText SWE Bench is a comprehensive benchmark designed to evaluate the coding abilities of large language models across various software engineering tasks
StaticText 30.08
StaticText %
StaticText Cosine Genie
StaticText 19.27
StaticText %
StaticText Factory Code Droid
StaticText 18.83
StaticText %
StaticText AutoCodeRover
StaticText 14.6
StaticText %
StaticText AppMap Navie
StaticText 13.82
StaticText %
StaticText Amazon Q Developer
StaticText 13.8
StaticText %
StaticText Devin (claimed)
StaticText 12.47
StaticText %
StaticText SWE-agent + GPT 4
StaticText 10.51
StaticText %
StaticText SWE-agent + Claude 3 Opus
StaticText 3.79
StaticText %
StaticText RAG + Claude 3 Opus
StaticText 3.44
StaticText %
StaticText Devin*
StaticText 1.96
StaticText %
StaticText RAG + Claude 2
StaticText 1.31
StaticText %
StaticText RAG + GPT 4
StaticText 0.7
StaticText %
StaticText RAG + SWE-Llama 13B
StaticText 0.17
StaticText %
StaticText RAG + ChatGPT 3.5
StaticText * Read about our findings and analysis in our
link technical report, url='https://cosine.sh/blog/genie-technical-report'
heading Powered by OpenAI Experimental Access Program
heading
list
listitem
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image
image C, url='https://cosine.sh/assets/cosine-c-black.png'
heading Data-Driven Autonomy
paragraph
StaticText Genie’s unparalleled software intuition comes from embedding human reasoning directly into its training data
link Learn More, url='https://cosine.sh/genie'
image
heading Access the power of Genie
paragraph
StaticText We’re excited to see all the creative ways you use Genie, if you’d like to try Genie on your own projects please tell us what you are building.
link Get Access, url='https://go.cosine.sh/get-genie'
heading Genie is the highest scoring software engineering model
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
contentinfo
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
link Terms, url='https://cosine.sh/legal/terms'
button Contact Us
button Careers
link Careers, url='https://app.dover.io/jobs/cosine'
link Twitter, url='https://go.cosine.sh/twitter'
image Twitter, url='https://cosine.sh/_astro/x.BCpZt-mX.png'
RootWebArea Genie: SOTA Software engineering model | Cosine - Human Reasoning Lab, focused, url='https://cosine.sh/genie'
navigation
[41] link Home, center=(702,56), inner_text=COSINE, url='https://cosine.sh/'
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
Show more
[46] link Genie, center=(840,56), url='https://cosine.sh/genie'
[47] link Blog, center=(926,56), url='https://cosine.sh/blog'
[48] link About Us, center=(1026,56), url='https://cosine.sh/about'
[50] link Join Waitlist, center=(1198,56), url='https://cosine.sh/register'
[57] link ✨ Introducing Genie, an AI software engineering model, center=(960,128), url='https://cosine.sh/genie'
heading World's best AI Software Engineer
paragraph
StaticText Genie is the best AI software engineer in the world by far - achieving a
StaticText 30% eval score
StaticText on the industry standard benchmark
[66] link SWE-Bench, center=(624,524), url='https://arxiv.org/abs/2310.06770'
StaticText .
paragraph
StaticText Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot.
paragraph
StaticText If you wish to talk to us about our model
[69] button feel free to reach out., center=(626,666)
[a] Iframe YouTube video player, center=(1344,515), title=YouTube video player
RootWebArea Introducing Genie - YouTube, url='https://www.youtube.com/embed/7XrTkhaI4zs?si=5ANRfSXbmbS31Lzp&vq=hd1080&enablejsapi=1&origin=https://cosine.sh'
[a31] link Photo image of Cosine, center=(1017,345)
[a39] link Introducing Genie, center=(1335,345), url='https://www.youtube.com/watch?v=7XrTkhaI4zs'
[a56] button Share, center=(1658,343), hasPopup='menu'
image
[a92] button Play, center=(1344,515)
image
generic, atomic
[a320] link Watch on YouTube, center=(1071,689), url='https://www.youtube.com/watch?v=7XrTkhaI4zs&embeds_referring_euri=https%3A%2F%2Fcosine.sh%2F&embeds_referring_origin=https%3A%2F%2Fcosine.sh'
StaticText State of the art
heading Genie has the highest score in the world on SWE-Bench
paragraph
StaticText SWE Bench is a comprehensive benchmark designed to evaluate the coding abilities of large language models across various software engineering tasks
StaticText 30.08
StaticText %
StaticText Cosine Genie
StaticText 19.27
StaticText %
StaticText Factory Code Droid
StaticText 18.83
StaticText %
StaticText AutoCodeRover
StaticText 14.6
StaticText %
StaticText AppMap Navie
StaticText 13.82
StaticText %
StaticText Amazon Q Developer
StaticText 13.8
StaticText %
StaticText Devin (claimed)
StaticText 12.47
StaticText %
StaticText SWE-agent + GPT 4
StaticText 10.51
StaticText %
StaticText SWE-agent + Claude 3 Opus
StaticText 3.79
StaticText %
StaticText RAG + Claude 3 Opus
StaticText 3.44
StaticText %
StaticText Devin*
StaticText 1.96
StaticText %
StaticText RAG + Claude 2
StaticText 1.31
StaticText %
StaticText RAG + GPT 4
StaticText 0.7
StaticText %
StaticText RAG + SWE-Llama 13B
StaticText 0.17
StaticText %
StaticText RAG + ChatGPT 3.5
StaticText * Read about our findings and analysis in our
link technical report, url='https://cosine.sh/blog/genie-technical-report'
heading Seamless GitHub Issue Integration
paragraph
StaticText Genie streamlines your workflow by directly integrating with your GitHub issue tracker. By importing issues, Genie ensures it fully understands the specific tasks and objectives. This integration allows Genie to create a detailed specification of work based on the issue's requirements, reducing manual input and enabling it to immediately tackle complex problems directly from your GitHub issues.
image
StaticText Github Issue
image
StaticText cosineai/monorepo
paragraph
StaticText Issue #2732
StaticText 26 Jul 2024
heading Create a new page under the website folder announcing our new model
heading Objective:
paragraph
StaticText Develop a new page on the website to showcase our new State-of-the-Art (SOTA) model. The page should be visually engaging and aligned with our brand aesthetics.
heading Key Features:
heading Header Section:
list
listitem
ListMarker •
heading Title & Description:
paragraph
StaticText Create a heading title to introduce the SOTA model.
paragraph
StaticText Add a brief description below the title to provide an overview of the model's capabilities and significance.
listitem
ListMarker •
heading YouTube Video Embed:
paragraph
StaticText Embed a YouTube video demo of the SOTA model on the right side of the header. Ensure the video is responsive and maintains aspect ratio across different devices.
heading Content Section:
list
listitem
ListMarker •
heading Featured Text Boxes:
paragraph
StaticText Create left and right-aligned text boxes that feature key highlights and benefits of the SOTA model.
paragraph
StaticText Each text box should have a corresponding animated React component beside it to enhance visual appeal and interactivity.
listitem
ListMarker •
heading Brand Gradient Background:
paragraph
StaticText Apply our brand gradient to the top side of the background, ensuring it appears behind the navigation bar. This gradient should seamlessly blend into the rest of the page background.
heading Data Driven Autonomy
paragraph
StaticText By training our models with a dataset that codifies human reasoning rather than simply prompting base models, we have seen that we're no longer just generating random code until some works, it's tackling problems like a human.
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
heading Access the power of Genie
paragraph
StaticText We’re excited to see all the creative ways you use Genie, if you’d like to try Genie on your own projects please tell us what you are building.
link Get Access, url='https://go.cosine.sh/get-genie'
heading Genie is the highest scoring software engineering model
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
contentinfo
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
link Terms, url='https://cosine.sh/legal/terms'
button Contact Us
button Careers
link Careers, url='https://app.dover.io/jobs/cosine'
link Twitter, url='https://go.cosine.sh/twitter'
image Twitter, url='https://cosine.sh/_astro/x.BCpZt-mX.png'
Reasoning for action: I chose to click on the Genie link because it directly corresponds to the user's desire to explore the capabilities and details of the Genie AI model. This action takes me to the dedicated page for Genie, which will likely contain essential information such as performance benchmarks and features that can help improve software engineering tasks, aligning with the user's goals.
Action: click(45) # Click on the Genie link to explore more about the AI software engineering model.
Change Description: After clicking the Genie link, the browser navigated to the Genie page, which is focused on presenting the AI software engineering model, resulting in a new title update to 'Genie: SOTA Software engineering model' and a revised layout including headings and descriptive text about Genie, along with integrated YouTube video and various features related to the model.
Screenshot before:Screenshot after:---
Step 2:
Url (before/after):
https://cosine.sh/genie
https://cosine.sh/genie
Content (before/after):
RootWebArea Genie: SOTA Software engineering model | Cosine - Human Reasoning Lab, focused, url='https://cosine.sh/genie'
navigation
[41] link Home, center=(702,56), inner_text=COSINE, url='https://cosine.sh/'
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
Show more
[46] link Genie, center=(840,56), url='https://cosine.sh/genie'
[47] link Blog, center=(926,56), url='https://cosine.sh/blog'
[48] link About Us, center=(1026,56), url='https://cosine.sh/about'
[50] link Join Waitlist, center=(1198,56), url='https://cosine.sh/register'
[57] link ✨ Introducing Genie, an AI software engineering model, center=(960,128), url='https://cosine.sh/genie'
heading World's best AI Software Engineer
paragraph
StaticText Genie is the best AI software engineer in the world by far - achieving a
StaticText 30% eval score
StaticText on the industry standard benchmark
[66] link SWE-Bench, center=(624,524), url='https://arxiv.org/abs/2310.06770'
StaticText .
paragraph
StaticText Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot.
paragraph
StaticText If you wish to talk to us about our model
[69] button feel free to reach out., center=(626,666)
[a] Iframe YouTube video player, center=(1344,515), title=YouTube video player
RootWebArea Introducing Genie - YouTube, url='https://www.youtube.com/embed/7XrTkhaI4zs?si=5ANRfSXbmbS31Lzp&vq=hd1080&enablejsapi=1&origin=https://cosine.sh'
[a31] link Photo image of Cosine, center=(1017,345)
[a39] link Introducing Genie, center=(1335,345), url='https://www.youtube.com/watch?v=7XrTkhaI4zs'
[a56] button Share, center=(1658,343), hasPopup='menu'
image
[a92] button Play, center=(1344,515)
image
generic, atomic
[a320] link Watch on YouTube, center=(1071,689), url='https://www.youtube.com/watch?v=7XrTkhaI4zs&embeds_referring_euri=https%3A%2F%2Fcosine.sh%2F&embeds_referring_origin=https%3A%2F%2Fcosine.sh'
StaticText State of the art
heading Genie has the highest score in the world on SWE-Bench
paragraph
StaticText SWE Bench is a comprehensive benchmark designed to evaluate the coding abilities of large language models across various software engineering tasks
StaticText 30.08
StaticText %
StaticText Cosine Genie
StaticText 19.27
StaticText %
StaticText Factory Code Droid
StaticText 18.83
StaticText %
StaticText AutoCodeRover
StaticText 14.6
StaticText %
StaticText AppMap Navie
StaticText 13.82
StaticText %
StaticText Amazon Q Developer
StaticText 13.8
StaticText %
StaticText Devin (claimed)
StaticText 12.47
StaticText %
StaticText SWE-agent + GPT 4
StaticText 10.51
StaticText %
StaticText SWE-agent + Claude 3 Opus
StaticText 3.79
StaticText %
StaticText RAG + Claude 3 Opus
StaticText 3.44
StaticText %
StaticText Devin*
StaticText 1.96
StaticText %
StaticText RAG + Claude 2
StaticText 1.31
StaticText %
StaticText RAG + GPT 4
StaticText 0.7
StaticText %
StaticText RAG + SWE-Llama 13B
StaticText 0.17
StaticText %
StaticText RAG + ChatGPT 3.5
StaticText * Read about our findings and analysis in our
link technical report, url='https://cosine.sh/blog/genie-technical-report'
heading Seamless GitHub Issue Integration
paragraph
StaticText Genie streamlines your workflow by directly integrating with your GitHub issue tracker. By importing issues, Genie ensures it fully understands the specific tasks and objectives. This integration allows Genie to create a detailed specification of work based on the issue's requirements, reducing manual input and enabling it to immediately tackle complex problems directly from your GitHub issues.
image
StaticText Github Issue
image
StaticText cosineai/monorepo
paragraph
StaticText Issue #2732
StaticText 26 Jul 2024
heading Create a new page under the website folder announcing our new model
heading Objective:
paragraph
StaticText Develop a new page on the website to showcase our new State-of-the-Art (SOTA) model. The page should be visually engaging and aligned with our brand aesthetics.
heading Key Features:
heading Header Section:
list
listitem
ListMarker •
heading Title & Description:
paragraph
StaticText Create a heading title to introduce the SOTA model.
paragraph
StaticText Add a brief description below the title to provide an overview of the model's capabilities and significance.
listitem
ListMarker •
heading YouTube Video Embed:
paragraph
StaticText Embed a YouTube video demo of the SOTA model on the right side of the header. Ensure the video is responsive and maintains aspect ratio across different devices.
heading Content Section:
list
listitem
ListMarker •
heading Featured Text Boxes:
paragraph
StaticText Create left and right-aligned text boxes that feature key highlights and benefits of the SOTA model.
paragraph
StaticText Each text box should have a corresponding animated React component beside it to enhance visual appeal and interactivity.
listitem
ListMarker •
heading Brand Gradient Background:
paragraph
StaticText Apply our brand gradient to the top side of the background, ensuring it appears behind the navigation bar. This gradient should seamlessly blend into the rest of the page background.
heading Data Driven Autonomy
paragraph
StaticText By training our models with a dataset that codifies human reasoning rather than simply prompting base models, we have seen that we're no longer just generating random code until some works, it's tackling problems like a human.
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
heading Access the power of Genie
paragraph
StaticText We’re excited to see all the creative ways you use Genie, if you’d like to try Genie on your own projects please tell us what you are building.
link Get Access, url='https://go.cosine.sh/get-genie'
heading Genie is the highest scoring software engineering model
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
contentinfo
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
link Terms, url='https://cosine.sh/legal/terms'
button Contact Us
button Careers
link Careers, url='https://app.dover.io/jobs/cosine'
link Twitter, url='https://go.cosine.sh/twitter'
image Twitter, url='https://cosine.sh/_astro/x.BCpZt-mX.png'
RootWebArea Genie: SOTA Software engineering model | Cosine - Human Reasoning Lab, focused, url='https://cosine.sh/genie'
navigation
[41] link Home, center=(702,56), inner_text=COSINE, url='https://cosine.sh/'
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
Show more
[46] link Genie, center=(840,56), url='https://cosine.sh/genie'
[47] link Blog, center=(926,56), url='https://cosine.sh/blog'
[48] link About Us, center=(1026,56), url='https://cosine.sh/about'
[50] link Join Waitlist, center=(1198,56), url='https://cosine.sh/register'
[57] link ✨ Introducing Genie, an AI software engineering model, center=(960,128), url='https://cosine.sh/genie'
heading World's best AI Software Engineer
paragraph
StaticText Genie is the best AI software engineer in the world by far - achieving a
StaticText 30% eval score
StaticText on the industry standard benchmark
[66] link SWE-Bench, center=(624,524), url='https://arxiv.org/abs/2310.06770'
StaticText .
paragraph
StaticText Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot.
paragraph
StaticText If you wish to talk to us about our model
[69] button feel free to reach out., center=(626,666)
[a] Iframe YouTube video player, center=(1344,515), title=YouTube video player
RootWebArea Introducing Genie - YouTube, url='https://www.youtube.com/embed/7XrTkhaI4zs?si=5ANRfSXbmbS31Lzp&vq=hd1080&enablejsapi=1&origin=https://cosine.sh'
[a31] link Photo image of Cosine, center=(1017,345)
[a39] link Introducing Genie, center=(1335,345), url='https://www.youtube.com/watch?v=7XrTkhaI4zs'
[a56] button Share, center=(1658,343), hasPopup='menu'
image
[a92] button Play, center=(1344,515)
image
generic, atomic
[a320] link Watch on YouTube, center=(1071,689), url='https://www.youtube.com/watch?v=7XrTkhaI4zs&embeds_referring_euri=https%3A%2F%2Fcosine.sh%2F&embeds_referring_origin=https%3A%2F%2Fcosine.sh'
StaticText State of the art
heading Genie has the highest score in the world on SWE-Bench
paragraph
StaticText SWE Bench is a comprehensive benchmark designed to evaluate the coding abilities of large language models across various software engineering tasks
StaticText 30.08
StaticText %
StaticText Cosine Genie
StaticText 19.27
StaticText %
StaticText Factory Code Droid
StaticText 18.83
StaticText %
StaticText AutoCodeRover
StaticText 14.6
StaticText %
StaticText AppMap Navie
StaticText 13.82
StaticText %
StaticText Amazon Q Developer
StaticText 13.8
StaticText %
StaticText Devin (claimed)
StaticText 12.47
StaticText %
StaticText SWE-agent + GPT 4
StaticText 10.51
StaticText %
StaticText SWE-agent + Claude 3 Opus
StaticText 3.79
StaticText %
StaticText RAG + Claude 3 Opus
StaticText 3.44
StaticText %
StaticText Devin*
StaticText 1.96
StaticText %
StaticText RAG + Claude 2
StaticText 1.31
StaticText %
StaticText RAG + GPT 4
StaticText 0.7
StaticText %
StaticText RAG + SWE-Llama 13B
StaticText 0.17
StaticText %
StaticText RAG + ChatGPT 3.5
StaticText * Read about our findings and analysis in our
link technical report, url='https://cosine.sh/blog/genie-technical-report'
heading Seamless GitHub Issue Integration
paragraph
StaticText Genie streamlines your workflow by directly integrating with your GitHub issue tracker. By importing issues, Genie ensures it fully understands the specific tasks and objectives. This integration allows Genie to create a detailed specification of work based on the issue's requirements, reducing manual input and enabling it to immediately tackle complex problems directly from your GitHub issues.
image
StaticText Github Issue
image
StaticText cosineai/monorepo
paragraph
StaticText Issue #2732
StaticText 26 Jul 2024
heading Create a new page under the website folder announcing our new model
heading Objective:
paragraph
StaticText Develop a new page on the website to showcase our new State-of-the-Art (SOTA) model. The page should be visually engaging and aligned with our brand aesthetics.
heading Key Features:
heading Header Section:
list
listitem
ListMarker •
heading Title & Description:
paragraph
StaticText Create a heading title to introduce the SOTA model.
paragraph
StaticText Add a brief description below the title to provide an overview of the model's capabilities and significance.
listitem
ListMarker •
heading YouTube Video Embed:
paragraph
StaticText Embed a YouTube video demo of the SOTA model on the right side of the header. Ensure the video is responsive and maintains aspect ratio across different devices.
heading Content Section:
list
listitem
ListMarker •
heading Featured Text Boxes:
paragraph
StaticText Create left and right-aligned text boxes that feature key highlights and benefits of the SOTA model.
paragraph
StaticText Each text box should have a corresponding animated React component beside it to enhance visual appeal and interactivity.
listitem
ListMarker •
heading Brand Gradient Background:
paragraph
StaticText Apply our brand gradient to the top side of the background, ensuring it appears behind the navigation bar. This gradient should seamlessly blend into the rest of the page background.
heading Data Driven Autonomy
paragraph
StaticText By training our models with a dataset that codifies human reasoning rather than simply prompting base models, we have seen that we're no longer just generating random code until some works, it's tackling problems like a human.
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
heading Access the power of Genie
paragraph
StaticText We’re excited to see all the creative ways you use Genie, if you’d like to try Genie on your own projects please tell us what you are building.
link Get Access, url='https://go.cosine.sh/get-genie'
heading Genie is the highest scoring software engineering model
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
contentinfo
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
link Terms, url='https://cosine.sh/legal/terms'
button Contact Us
button Careers
link Careers, url='https://app.dover.io/jobs/cosine'
link Twitter, url='https://go.cosine.sh/twitter'
image Twitter, url='https://cosine.sh/_astro/x.BCpZt-mX.png'
Reasoning for action: By clicking on the YouTube video link (39), I am directly engaging with the resource that contains a detailed visual and auditory introduction to the Genie AI model. This supports the user in understanding the model's performance and functionalities, which is critical for their decision-making. The introduction video is likely to provide insights that complement the textual information already available on the Cosine homepage, thus advancing the user's exploration of Genie.
Action: click(39) # Click on the YouTube link to watch the introduction video about Genie.
Change Description: After clicking the YouTube link, the browser changed to display an embedded YouTube video player showcasing the introduction video about Genie, allowing the user to watch the video directly on the webpage.
Screenshot before:Screenshot after:---
Step 3:
Url (before/after):
https://cosine.sh/genie
https://arxiv.org/abs/2310.06770
Content (before/after):
RootWebArea Genie: SOTA Software engineering model | Cosine - Human Reasoning Lab, focused, url='https://cosine.sh/genie'
navigation
[41] link Home, center=(702,56), inner_text=COSINE, url='https://cosine.sh/'
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
Show more
[46] link Genie, center=(840,56), url='https://cosine.sh/genie'
[47] link Blog, center=(926,56), url='https://cosine.sh/blog'
[48] link About Us, center=(1026,56), url='https://cosine.sh/about'
[50] link Join Waitlist, center=(1198,56), url='https://cosine.sh/register'
[57] link ✨ Introducing Genie, an AI software engineering model, center=(960,128), url='https://cosine.sh/genie'
heading World's best AI Software Engineer
paragraph
StaticText Genie is the best AI software engineer in the world by far - achieving a
StaticText 30% eval score
StaticText on the industry standard benchmark
[66] link SWE-Bench, center=(624,524), url='https://arxiv.org/abs/2310.06770'
StaticText .
paragraph
StaticText Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot.
paragraph
StaticText If you wish to talk to us about our model
[69] button feel free to reach out., center=(626,666)
[a] Iframe YouTube video player, center=(1344,515), title=YouTube video player
RootWebArea Introducing Genie - YouTube, url='https://www.youtube.com/embed/7XrTkhaI4zs?si=5ANRfSXbmbS31Lzp&vq=hd1080&enablejsapi=1&origin=https://cosine.sh'
[a31] link Photo image of Cosine, center=(1017,345)
[a39] link Introducing Genie, center=(1335,345), url='https://www.youtube.com/watch?v=7XrTkhaI4zs'
[a56] button Share, center=(1658,343), hasPopup='menu'
image
[a92] button Play, center=(1344,515)
image
generic, atomic
[a320] link Watch on YouTube, center=(1071,689), url='https://www.youtube.com/watch?v=7XrTkhaI4zs&embeds_referring_euri=https%3A%2F%2Fcosine.sh%2F&embeds_referring_origin=https%3A%2F%2Fcosine.sh'
StaticText State of the art
heading Genie has the highest score in the world on SWE-Bench
paragraph
StaticText SWE Bench is a comprehensive benchmark designed to evaluate the coding abilities of large language models across various software engineering tasks
StaticText 30.08
StaticText %
StaticText Cosine Genie
StaticText 19.27
StaticText %
StaticText Factory Code Droid
StaticText 18.83
StaticText %
StaticText AutoCodeRover
StaticText 14.6
StaticText %
StaticText AppMap Navie
StaticText 13.82
StaticText %
StaticText Amazon Q Developer
StaticText 13.8
StaticText %
StaticText Devin (claimed)
StaticText 12.47
StaticText %
StaticText SWE-agent + GPT 4
StaticText 10.51
StaticText %
StaticText SWE-agent + Claude 3 Opus
StaticText 3.79
StaticText %
StaticText RAG + Claude 3 Opus
StaticText 3.44
StaticText %
StaticText Devin*
StaticText 1.96
StaticText %
StaticText RAG + Claude 2
StaticText 1.31
StaticText %
StaticText RAG + GPT 4
StaticText 0.7
StaticText %
StaticText RAG + SWE-Llama 13B
StaticText 0.17
StaticText %
StaticText RAG + ChatGPT 3.5
StaticText * Read about our findings and analysis in our
link technical report, url='https://cosine.sh/blog/genie-technical-report'
heading Seamless GitHub Issue Integration
paragraph
StaticText Genie streamlines your workflow by directly integrating with your GitHub issue tracker. By importing issues, Genie ensures it fully understands the specific tasks and objectives. This integration allows Genie to create a detailed specification of work based on the issue's requirements, reducing manual input and enabling it to immediately tackle complex problems directly from your GitHub issues.
image
StaticText Github Issue
image
StaticText cosineai/monorepo
paragraph
StaticText Issue #2732
StaticText 26 Jul 2024
heading Create a new page under the website folder announcing our new model
heading Objective:
paragraph
StaticText Develop a new page on the website to showcase our new State-of-the-Art (SOTA) model. The page should be visually engaging and aligned with our brand aesthetics.
heading Key Features:
heading Header Section:
list
listitem
ListMarker •
heading Title & Description:
paragraph
StaticText Create a heading title to introduce the SOTA model.
paragraph
StaticText Add a brief description below the title to provide an overview of the model's capabilities and significance.
listitem
ListMarker •
heading YouTube Video Embed:
paragraph
StaticText Embed a YouTube video demo of the SOTA model on the right side of the header. Ensure the video is responsive and maintains aspect ratio across different devices.
heading Content Section:
list
listitem
ListMarker •
heading Featured Text Boxes:
paragraph
StaticText Create left and right-aligned text boxes that feature key highlights and benefits of the SOTA model.
paragraph
StaticText Each text box should have a corresponding animated React component beside it to enhance visual appeal and interactivity.
listitem
ListMarker •
heading Brand Gradient Background:
paragraph
StaticText Apply our brand gradient to the top side of the background, ensuring it appears behind the navigation bar. This gradient should seamlessly blend into the rest of the page background.
heading Data Driven Autonomy
paragraph
StaticText By training our models with a dataset that codifies human reasoning rather than simply prompting base models, we have seen that we're no longer just generating random code until some works, it's tackling problems like a human.
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
heading Access the power of Genie
paragraph
StaticText We’re excited to see all the creative ways you use Genie, if you’d like to try Genie on your own projects please tell us what you are building.
link Get Access, url='https://go.cosine.sh/get-genie'
heading Genie is the highest scoring software engineering model
link Technical Report, url='https://cosine.sh/blog/genie-technical-report'
image
contentinfo
image C, url='https://cosine.sh/assets/cosine-c-black.png'
StaticText COSINE
link Terms, url='https://cosine.sh/legal/terms'
button Contact Us
button Careers
link Careers, url='https://app.dover.io/jobs/cosine'
link Twitter, url='https://go.cosine.sh/twitter'
image Twitter, url='https://cosine.sh/_astro/x.BCpZt-mX.png'
RootWebArea [2310.06770] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, focused, url='https://arxiv.org/abs/2310.06770'
complementary
heading Change to arXiv's privacy policy
paragraph
StaticText The
Show more
[73] link arXiv Privacy Policy, center=(117,64), url='https://info.arxiv.org/help/policies/privacy_policy.html'
StaticText has changed. By continuing to use arxiv.org, you are agreeing to the privacy policy.
[76] link I UNDERSTAND, center=(1753,48), url='https://arxiv.org/abs/2310.06770#'
banner
link Skip to main content, url='https://arxiv.org/abs/2310.06770#content'
[92] link arxiv logo, center=(55,135), url='https://arxiv.org/'
image arxiv logo, url='https://arxiv.org/static/browse/0.3.4/images/arxiv-logo-one-color-white.svg'
StaticText >
[95] link cs, center=(127,135), url='https://arxiv.org/list/cs/recent'
StaticText >
StaticText arXiv:2310.06770
[101] textbox Search term or terms, center=(1656,118), contenteditable=True, type=text
paragraph
[103] link Help, center=(1615,142), url='https://info.arxiv.org/help'
StaticText |
[104] link Advanced Search, center=(1684,142), url='https://arxiv.org/search/advanced'
[107] combobox Field to search value='All fields', center=(1787,118), contenteditable=True, expanded=False, hasPopup='menu', inner_text=All fields
Title
Author
Abstract
Comments
Journal reference
ACM classification
MSC classification
Report number
arXiv identifier
DOI
ORCID
arXiv author ID
Help pages
Full text
option All fields, selected=True
option Title, selected=False
option Author, selected=False
option Abstract, selected=False
option Comments, selected=False
option Journal reference, selected=False
option ACM classification, selected=False
option MSC classification, selected=False
option Report number, selected=False
option arXiv identifier, selected=False
option DOI, selected=False
option ORCID, selected=False
option arXiv author ID, selected=False
option Help pages, selected=False
option Full text, selected=False
[124] button Search, center=(1876,118)
main
heading Computer Science > Computation and Language
StaticText [Submitted on 10 Oct 2023 (
[172] link v1, center=(172,212), url='https://arxiv.org/abs/2310.06770v1'
StaticText ), last revised 11 Nov 2024 (this version, v3)]
heading SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
[177] link Carlos E. Jimenez, center=(85,276), url='https://arxiv.org/search/cs?searchtype=author&query=Jimenez,+C+E'
StaticText ,
[178] link John Yang, center=(196,276), url='https://arxiv.org/search/cs?searchtype=author&query=Yang,+J'
StaticText ,
[179] link Alexander Wettig, center=(303,276), url='https://arxiv.org/search/cs?searchtype=author&query=Wettig,+A'
StaticText ,
[180] link Shunyu Yao, center=(415,276), url='https://arxiv.org/search/cs?searchtype=author&query=Yao,+S'
StaticText ,
[181] link Kexin Pei, center=(500,276), url='https://arxiv.org/search/cs?searchtype=author&query=Pei,+K'
StaticText ,
[182] link Ofir Press, center=(578,276), url='https://arxiv.org/search/cs?searchtype=author&query=Press,+O'
StaticText ,
[183] link Karthik Narasimhan, center=(693,276), url='https://arxiv.org/search/cs?searchtype=author&query=Narasimhan,+K'
blockquote
StaticText Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of
StaticText 2
StaticText ,
StaticText 294
StaticText software engineering problems drawn from real GitHub issues and corresponding pull requests across
StaticText 12
StaticText popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere
StaticText 1.96
StaticText % of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.
table Additional metadata
rowgroup
row
[225] cell Comments:, center=(55,462)
cell Data, code, and leaderboard are available at this https URL ICLR 2024, this https URL
[227] link this https URL, center=(361,461), url='https://www.swebench.com/'
[228] link this https URL, center=(499,461), url='https://openreview.net/forum?id=VTF8yNQM66'
row
[230] cell Subjects:, center=(55,481)
[231] cell Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE), center=(345,481)
row
[234] cell Cite as:, center=(55,501)
cell arXiv:2310.06770 [cs.CL]
[237] link arXiv:2310.06770, center=(136,500), url='https://arxiv.org/abs/2310.06770'
row
[239] cell, center=(55,520), inner_text=
cell (or arXiv:2310.06770v3 [cs.CL] for this version)
[242] link arXiv:2310.06770v3, center=(160,520), url='https://arxiv.org/abs/2310.06770v3'
row
[244] cell, center=(55,540), inner_text=
cell https://doi.org/10.48550/arXiv.2310.06770 Focus to learn more
[246] link https://doi.org/10.48550/arXiv.2310.06770, center=(199,540), url='https://doi.org/10.48550/arXiv.2310.06770'
[248] button Focus to learn more, center=(318,540)
image
[212] span, center=(1092,398), inner_text=1.96
[202] span, center=(1421,334), inner_text=12
[190] span, center=(772,334), inner_text=2,294
heading Submission history
StaticText From: Carlos E. Jimenez [
[255] link view email, center=(200,594), url='https://arxiv.org/show-email/1212163e/2310.06770'
StaticText ]
strong
[258] link [v1], center=(30,613), url='https://arxiv.org/abs/2310.06770v1'
StaticText Tue, 10 Oct 2023 16:47:29 UTC (2,003 KB)
strong
[261] link [v2], center=(30,633), url='https://arxiv.org/abs/2310.06770v2'
StaticText Fri, 5 Apr 2024 18:16:29 UTC (2,258 KB)
strong
StaticText [v3]
StaticText Mon, 11 Nov 2024 23:05:04 UTC (2,398 KB)
heading Access Paper:
list
listitem
ListMarker •
[273] link View PDF, center=(1733,220), url='https://arxiv.org/pdf/2310.06770'
listitem
ListMarker •
[275] link HTML (experimental), center=(1772,238), url='https://arxiv.org/html/2310.06770v3'
listitem
ListMarker •
[277] link TeX Source, center=(1738,256), url='https://arxiv.org/src/2310.06770'
listitem
ListMarker •
[279] link Other Formats, center=(1748,274), url='https://arxiv.org/format/2310.06770'
[281] link license icon view license, center=(1793,293), inner_text=view license, title=Rights to this article, url='http://creativecommons.org/licenses/by/4.0/'
image license icon, url='https://arxiv.org/icons/licenses/by-4.0.png'
StaticText Current browse context:
StaticText cs.CL
[288] link < prev, center=(1704,352), title=previous in cs.CL (accesskey p), url='https://arxiv.org/prevnext?id=2310.06770&function=prev&context=cs.CL'
StaticText |
[291] link next >, center=(1781,352), title=next in cs.CL (accesskey n), url='https://arxiv.org/prevnext?id=2310.06770&function=next&context=cs.CL'
[294] link new, center=(1690,369), url='https://arxiv.org/list/cs.CL/new'
StaticText |
[296] link recent, center=(1730,369), url='https://arxiv.org/list/cs.CL/recent'
StaticText |
[298] link 2023-10, center=(1782,369), url='https://arxiv.org/list/cs.CL/2023-10'
StaticText Change to browse by:
[301] link cs, center=(1684,408), url='https://arxiv.org/abs/2310.06770?context=cs'
[303] link cs.AI, center=(1702,423), url='https://arxiv.org/abs/2310.06770?context=cs.AI'
[305] link cs.SE, center=(1704,438), url='https://arxiv.org/abs/2310.06770?context=cs.SE'
heading References & Citations
list
listitem
ListMarker •
[311] link NASA ADS, center=(1730,493), url='https://ui.adsabs.harvard.edu/abs/arXiv:2310.06770'
listitem
ListMarker •
[313] link Google Scholar, center=(1742,508), url='https://scholar.google.com/scholar_lookup?arxiv_id=2310.06770'
listitem
ListMarker •
[315] link Semantic Scholar, center=(1749,523), url='https://api.semanticscholar.org/arXiv:2310.06770'
StaticText Export BibTeX Citation
heading Bookmark
[334] link BibSonomy logo, center=(1688,606), inner_text= , title=Bookmark on BibSonomy, url='http://www.bibsonomy.org/BibtexHandler?requTask=upload&url=https://arxiv.org/abs/2310.06770&description=SWE-bench:%20Can%20Language%20Models%20Resolve%20Real-World%20GitHub%20Issues?'
image BibSonomy logo, url='https://arxiv.org/static/browse/0.3.4/images/icons/social/bibsonomy.png'
[336] link Reddit logo, center=(1706,606), title=Bookmark on Reddit, url='https://reddit.com/submit?url=https://arxiv.org/abs/2310.06770&title=SWE-bench:%20Can%20Language%20Models%20Resolve%20Real-World%20GitHub%20Issues?'
image Reddit logo, url='https://arxiv.org/static/browse/0.3.4/images/icons/social/reddit.png'
[341] LabelText, center=(98,712), inner_text=Bibliographic Tools
StaticText Bibliographic Tools
heading Bibliographic and Citation Tools
LabelText
checkbox Bibliographic Explorer Toggle, checked='false'
StaticText Bibliographic Explorer Toggle
StaticText Bibliographic Explorer
emphasis
StaticText (
[354] link What is the Explorer?, center=(273,816), url='https://info.arxiv.org/labs/showcase.html#arxiv-bibliographic-explorer'
StaticText )
[349] span, center=(70,816)
LabelText
checkbox Connected Papers, checked='false'
StaticText Connected Papers Toggle
StaticText Connected Papers
emphasis
StaticText (
[364] link What is Connected Papers?, center=(272,852), url='https://www.connectedpapers.com/about'
StaticText )
[359] span, center=(70,852)
LabelText
checkbox Litmaps, checked='false'
StaticText Litmaps Toggle
StaticText Litmaps
emphasis
StaticText (
[374] link What is Litmaps?, center=(188,888), url='https://www.litmaps.co/'
StaticText )
[369] span, center=(70,888)
LabelText
checkbox scite Smart Citations, checked='false'
StaticText scite.ai Toggle
StaticText scite Smart Citations
emphasis
StaticText (
[384] link What are Smart Citations?, center=(278,924), url='https://www.scite.ai/'
StaticText )
[379] span, center=(70,924)
[390] LabelText, center=(248,712), inner_text=Code, Data, Media
StaticText Code, Data, Media
[473] LabelText, center=(361,712), inner_text=Demos
StaticText Demos
[511] LabelText, center=(465,712), inner_text=Related Papers
StaticText Related Papers
[558] LabelText, center=(598,712), inner_text=About arXivLabs
StaticText About arXivLabs
[572] link Which authors of this paper are endorsers?, center=(145,1036), url='https://arxiv.org/auth/show-endorsers/2310.06770'
StaticText |
[573] link Disable MathJax, center=(328,1036), url='javascript:setMathjaxCookie()'
StaticText (
[574] link What is MathJax?, center=(436,1036), url='https://info.arxiv.org/help/mathjax.html'
StaticText )
contentinfo
navigation Secondary
list
listitem
link About, url='https://info.arxiv.org/about'
listitem
link Help, url='https://info.arxiv.org/help'
list
listitem
image contact arXiv
link Contact, url='https://info.arxiv.org/help/contact.html'
listitem
image subscribe to arXiv mailings
link Subscribe, url='https://info.arxiv.org/help/subscribe'
list
listitem
link Copyright, url='https://info.arxiv.org/help/license/index.html'
listitem
link Privacy Policy, url='https://info.arxiv.org/help/policies/privacy_policy.html'
list
listitem
link Web Accessibility Assistance, url='https://info.arxiv.org/help/web_accessibility.html'
listitem
paragraph
link arXiv Operational Status, url='https://status.arxiv.org/'
image
StaticText Get status notifications via
link email, url='https://subscribe.sorryapp.com/24846f03/email/new'
image
StaticText or
link slack, url='https://subscribe.sorryapp.com/24846f03/slack/new'
image
Reasoning for action: I clicked on the SWE-Bench link because the user's goal is to explore details related to the Genie AI model's performance benchmarks. By performing this action, I can access the arXiv page that contains comprehensive information about the SWE-Bench paper, which is critical for understanding how the Genie model is evaluated against industry standards. This directly addresses the user story's requirement to understand Genie’s performance metrics in the context of software engineering tasks.
Action: click(66) # Click on the SWE-Bench link to explore the benchmark details.
Change Description: The browser navigated from the Cosine website to the arXiv page for the paper titled 'SWE-bench: Can Language Models Resolve Real-World GitHub Issues?', changing the content displayed to metadata and details about the benchmark study, while refreshing the URL to 'https://arxiv.org/abs/2310.06770'.
Screenshot before:Screenshot after:---