Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.
California-based Activeloop, a startup offering a dedicated database to streamline AI projects, today announced it has raised $11 million in series A funding from Streamlined Ventures, Y Combinator, Samsung Next (the startup acceleration arm of the Samsung Group) and multiple other investors.
While there are several data platforms out there, Activeloop, founded by Princeton dropout Davit Buniatyan, has carved a niche for itself with a system to tackle one of the biggest challenges enterprises face today: leveraging unstructured multimodal data for training AI models. The company claims this technology, dubbed “Deep Lake,” allows teams to create AI applications at a cost up to 75% lower than market offerings while increasing engineering teams’ productivity by up to five-fold.
The work is important as more and more enterprises look for ways to tap their complex datasets for AI applications targeted at different use cases. According to McKinsey research, generative AI has the potential to generate $2.6 trillion to $4.4 trillion in global corporate profits annually with significant impact across dozens of areas, including providing support interactions with customers, generating creative content for marketing and sales and drafting software code based on natural-language prompts.
What does Activeloop Deep Lake help with?
Today, training highly performant foundation AI models involves dealing with petabyte-scale unstructured data covering modalities such as text, audio and video. The task usually requires teams to identify relevant datasets from disorganized silos and put them to work on an ongoing basis with different storage and retrieval technologies — something that requires a lot of boilerplate coding and integration from engineers and can increase the cost of the project.
VB Event
The AI Impact Tour – Atlanta
Request an invite
Activeloop targets this inconsistent approach with the standardization of Deep Lake, which stores complex data — such as images, videos, and annotations, among others — in the form of machine learning (ML)-native mathematical representations (tensors) and facilitates the streaming of these tensors to SQL-like Tensor Query Language, an in-browser visualization engine, or deep learning frameworks like PyTorch and TensorFlow.
This gives developers one platform for everything, from filtering and searching multi-modal data to tracking and comparing its versions over time and streaming it for training models aimed at different use cases.
In a conversation with VentureBeat, Buniatyan says Deep Lake offers all the benefits of a vanilla data lake (such as ingesting multimodal data from silos) but stands out by converting it all into the tensor format, which deep learning algorithms expect as inputs.
The tensors are neatly stored in cloud-based object storage or local storage, such as AWS S3, and then seamlessly streamed from the cloud to graphics processing units (GPUs) for training – handing off just enough data to compute for it to be fully utilized. Previous approaches that dealt with large datasets required copying the data in batches, which left GPUs idling.
Buniatyan said he started working on Activeloop and this technology in 2018 when he faced the challenge of storing and preprocessing thousands of high-resolution mice brain scans at the Princeton Neuroscience Lab. Since then, the company has developed core database functionalities with two main categories: open source and proprietary.
“The open-source aspect encompasses the dataset format, version control, and a wide array of APIs designed for streaming and querying, among other capabilities. On the other hand, the proprietary segment includes advanced visualization tools, knowledge retrieval, and a performant streaming engine, which together enhance the overall functionality and appeal of their product,” he told VentureBeat.
While the CEO did not share the exact number of customers Activeloop is working with, he did note that the open-source project has been downloaded more than one million times to date and has propelled the company’s presence in the enterprise segment. Currently, the enterprise-centric offering comes with a usage-based pricing model and is being leveraged by Fortune 500 companies across highly regulated industries including biopharma, life sciences, medtech, automotive and legal.
One customer, Bayer Radiology, used Deep Lake to unify different data modalities into a single storage solution, streamlining data pre-processing time and enabling a new “chat with X-rays” capability allowing data scientists to query scans in natural language.
“Activeloop’s knowledge retrieval feature is optimized to help data teams create solutions at a cost up to 75% lower than anything else on the market, while increasing the retrieval accuracy significantly, which is important in the industries that Activeloop serves,” the founder added.
Plan to grow
With this round of funding, Activeloop plans to build its enterprise offering and rope in more customers to the database for AI, enabling them to organize complex unstructured data and retrieve knowledge with ease.
The company also plans to use the funds to scale up its engineering team.
“A key development in the pipeline is an upcoming release of Deep Lake v4, with – faster concurrent IO, the fastest streaming data loader for training models, complete reproducible data lineage and external data source integrations,” Buniatyan noted while claiming that there are many customers in this space but “no direct competitors.”
Ultimately, he hopes the technology will save enterprises from spending millions on in-house solutions for data organization and retrieval as well as keep engineers from doing lots of manual handiwork and boilerplate coding, making them more productive.