Twelve Labs teaches artificial intelligence to “see” and change video understanding

Of course, the scores of football matches are important. But sporting events can also spawn untold cultural moments — like when Travis Kelce signed a heart for Taylor Swift in the stands. While this type of video may be social media gold, it’s easy to miss it with traditional content tagging systems. This is where Lab Twelve comes in.
“Every sports team or sports league has decades of footage of players captured in games, around the stadium,” Soyoung Lee, co-founder of Twelve Labs and head of GTM, told the Observer. However, these archives are often underutilized due to inconsistent and outdated content management. “To date, most processes for labeling content have been manual.”
Twelve Labs, a San Francisco-based startup focused on video understanding AI, hopes to unlock the value of video content by providing models that can search massive archives, generate text summaries, and create short clips from long footage. Its work extends far beyond sports, into industries ranging from entertainment and advertising to security.
“Large language models can read and write very well,” Li said. “But we want to continue to create a world that AI can also see.”
Are Laboratory 12 and Laboratory 11 related?
Founded in 2021, Twelve Labs is not to be confused with ElevenLabs, an AI startup specializing in audio. “We started a year early,” Lee jokes, adding that Twelve Labs (named after the original size of its founding team) often partners with ElevenLabs to host hackathons, including one called “23Labs.”
The startup’s ambitious vision has piqued the interest of deep-pocketed backers. It has raised more than $100 million in funding from investors including Nvidia, Intel and Firstman Studio. squid game Creator Huang Donghyuk. Its lineup of advisors is equally star-studded, including Fei-Fei Li, Jeffrey Katzenberg and Alexander Wang.
Twelve Labs has thousands of developers and hundreds of enterprise customers. Demand is highest in the entertainment and media sectors, including Hollywood studios, sports leagues, social media influencers and advertising agencies, which rely on Twelve Labs tools to automatically generate clips, assist with scene selection or enable contextual ad placement.
Government agencies also use the startup’s technology for video search and incident retrieval. In addition to working with the U.S. and other countries, Twelve Labs is deployed in Sejong, South Korea, to help CCTV operators monitor thousands of camera footage and locate specific incidents, Lee said. To reduce security risks, the company removed facial and biometric features, she added.
Will video-native AI replace human jobs?
Many industries served by Twelve Labs are already debating whether artificial intelligence will threaten human jobs—a concern that Lee believes is only partially justified. “I don’t know if jobs themselves will disappear, but jobs have to transform,” she said, comparing the shift to how tools like Photoshop are reshaping creative roles.
If anything, Lee believes systems like Twelve Labs will democratize creative work that has traditionally been limited to companies with big budgets. “You can now do things with less money, which means you can create more stories by independent creatives who don’t have the same capital,” she said. “It actually allows for expanded content creation and personalized distribution.”
Twelve Labs isn’t the only AI company focused on video, but the company insists it fills a different need than its much larger rivals. “We’re excited that video is starting to get more attention now, but what we’re seeing is a lot of innovation in large language models, a lot of innovation in video generation models and image generation models like Sora, but not in video understanding,” Lee said, referring to OpenAI’s text-to-video AI models and applications.
Currently, TwelveLabs provides video search, video analysis and video-to-text functions. The company plans to expand into an agent platform that can not only understand videos but also build narratives from them. Such models could be useful outside of the creative realm, Lee said, citing examples such as retailers identifying peak traffic times or security customers mapping the sequence of accident-related events.




