Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
Microsoft Co-pilot is getting smarter and smarter day by day. The company led by Satya Nadella has just announced that its AI assistant now has “vision” capabilities that allow it to navigate the internet with users.
While the functionality was first announced in October this yearthe company is now previewing it with a select set of Pro subscribers. According to Microsoft, these users will be able to trigger Copilot Vision on web pages open on their Edge browser and interact with it regarding the content visible on the screen.
The feature is still in the early stages of development and is quite limited, but once fully developed it could be a game-changer for Microsoft's enterprise customers, helping them with analysis and decision-making when interacting with the products that the company offers. its ecosystem (OneDrive, Excel, SharePoint, etc.)
In the long term, it will also be interesting to see how Copilot Vision performs compared to more open and efficient agent offers, such as those of Anthropic And Emergence AIwhich allow developers to integrate agents to see, reason about, and take action on applications from different vendors.
What to expect with Copilot Vision?
When a user opens a website, it may or may not have an intended purpose. But, when they do, such as when researching an academic article, the process of performing the desired task involves browsing the website, reading all of its content, and then responding to it (e.g., if the content of the website should be used for reference). for paper or not). The same goes for other everyday web tasks like shopping.
With the new Copilot Vision experience, Microsoft aims to simplify this entire process. Essentially, the user now has a wizard that sits at the bottom of their browser and can be called upon at any time to read the content of the website, covering all text and images, and help them make decisions.
It can immediately scan, analyze and provide all the required information, taking into account the user's intended purpose, just like a second pair of eyes.
This feature has huge benefits (it can speed up your workflows in no time) as well as major implications, since the agent reads and evaluates everything you browse. However, Microsoft has ensured that all context and information shared by users is deleted as soon as the Vision session is closed. He also noted that data from websites is not captured/stored to train the underlying models.
“In short, we prioritize copyrights, creators, and the privacy and security of our users – and we put them all first,” the Copilot team wrote in a post from blog announcing the feature preview.
Comment-based expansion
Currently, a selected set of Copilot Pro Subscribers In the United States, those who signed up for the Copilot Labs early access program will be able to use vision capabilities in their Edge browser. The feature will be optional, meaning they won't have to worry about AI constantly reading their screens.
Also, at this point it will only work with certain websites. Microsoft says it will take feedback from early adopters and gradually improve features while expanding support to more Pro users and other websites.
In the long term, the company could even extend these capabilities to other products in its ecosystem, such as OneDrive and Excel, making it easier for business users to work and make decisions. However, there is no official confirmation yet. Not to mention, given the cautious approach reported here, this could take a while to become a reality.
Microsoft's decision to launch the Copilot Vision preview comes at a time when competitors are pushing the bar in the field of agentic AI. Salesforce already has deployed AgentForce through its Customer 360 offerings to automate workflows in areas such as sales, marketing and service.
Meanwhile, Anthropic launched “Computer use”, which allows developers to integrate Claude to interact with a computer desktop environment, performing tasks that were previously handled only by human workers, such as opening applications, interacting with interfaces, and filling in files. forms.
#Microsoft #Copilot #Vision #letting #you39re #online
AI,Business,Agentic AI,AI agent,AI agents,AI, ML and Deep Learning,Anthropic,category-/Business & Industrial,category-/Computers & Electronics/Software,Computer Use,Conversational AI,CoPilot,copilot pro,Copilot Vision,edge browser,Generative AI,large language models,Microsoft,Microsoft 365,microsoft copilot,Microsoft Copilot Vision,Microsoft Edge,Microsoft Office,NLP,Salesforce ,