Snorkel AI, the data-centric AI platform company powered by programmatic data labeling, announced the general availability of their data-centric AI platform, Snorkel Flow, for enterprises to accelerate AI application development 10-100x with automated data labeling.
The generally available version of Snorkel Flow includes enhancements to its programmatic data labeling, integrated ML modeling, collaborative AI application development, guided data iteration capabilities, and pre-built templates for classification of and information extraction from documents.
Today, enterprise data scientists and domain experts spend over 80 percent of AI development time gathering, organizing, and manually labeling the training data used to train machine learning models (Cognilytica 2020 report). Manual labeling is notoriously expensive and slow, limiting development teams’ ability to build, iterate, adapt, or audit applications in a systematic and privacy-compliant manner. The training data bottleneck is a primary reason why 87 percent of AI projects never make it into production.
To solve the training data bottleneck, Snorkel Flow provides the world’s first data-centric AI platform for enterprise teams to label training data programmatically, use error analysis to guide training data and model iteration in tandem, and adapt to real-world changes with a few clicks rather than complete manual relabeling. With Snorkel Flow, organizations have achieved state-of-the-art machine learning model accuracy in days rather than weeks or months.
Snorkel AI recently announced that Snorkel Flow is deployed at Chubb, the world’s largest publicly traded property and casualty insurer. Snorkel AI’s customer base continues to grow rapidly, including Memorial Sloan Kettering Cancer Center, the world’s largest and oldest cancer center, two of the three top US banks, and other Fortune 500 organizations in the biotech, oil and gas, telecom sectors, and several government agencies.
“A significant need for AI models is labeled data, often tedious and expensive to generate,” said Janet Mak, Deputy CIO and VP of Digital Solutions, Memorial Sloan Kettering Cancer Center. “We are leveraging modern data-centric AI approaches, using generative learning with weak supervision, for machine-labeling data. We have applied Snorkel Flow to two use cases using pathology reports. We accurately labeled a few thousand pathology reports (95% accuracy, 85% precision) using one SME in days versus weeks. In addition to these material time savings, Snorkel Flow allows our teams to collaborate on the data accuracy and provides time efficiencies for our highly valued physicians and medical professionals.”
“Snorkel Flow is the result of over half a decade of research and close partnership with our customers. With a focus on speed, privacy, and collaboration, the platform delivers what Fortune 500 companies need to build mission-critical AI applications that power their business, protect data, and scale the use of AI,” said Alex Ratner, co-founder and CEO, Snorkel AI.
The generally available version of Snorkel Flow delivers a data-centric development workflow for data science and machine learning practitioners to tackle document intelligence applications. This includes:
- Programmatic data labeling: No-code and Python SDK interfaces for programmatic labeling, with state-of-the-art weak supervision algorithms.
- Integrated ML modeling suite: No-code, continuous training of leading, pre-configured models and modeling tools like AutoML available in-platform.
- Collaborative AI application development: Workflows for domain experts to encode labeling insight and rationale at scale and platform tools for real-time troubleshooting.
- Guided data iteration: Actionable error analysis and active learning workflows to improve training data quality and achieve production-worthy model accuracy faster.
- Accelerated document intelligence: Built-in pipeline templates with pre- and post-processing operators, models, and business logic for document classification and extraction applications.
Snorkel Flow is built for the modern enterprise featuring cloud-agnostic Kubernetes deployment options, role-based access controls, SSO integrations, encryption in-transit and at-rest, and more. In addition to generally available capabilities, several enhancements were released as beta recently including the new Studio experience, annotation workspace, PDF extraction and conversational AI pipelines, sequence tagging capabilities, and more.