Trifacta accelerates data cleaning & preparation with a
modern platform for cloud data lakes & warehouses
Ensure the success of your analytics, ML & data onboarding initiatives across any cloud, hybrid and multi-cloud environment
What is Data Wrangling?
Successful analysis relies upon accurate, well-structured data that has been formatted for the specific needs of the task at hand. Yet, today’s data is bigger and more complex than ever before. It’s time-consuming and technically challenging to wrangle it into a format for analysis. Data wrangling is the process you must undergo to transition raw data source inputs into prepared outputs to be utilized in analysis and various other business purposes.
An Intelligent Platform that Interoperates with Your Data Investments
Trifacta sits between the data storage and processing environments and the visualization, statistical or machine learning tools used downstream. The platform is architected to be open and adaptable so as the technologies upstream and downstream change, the investments and logic created in Trifacta are able to utilize those innovations.
Collaborative Data Goverance refers to features within Trifacta that provide extensive support for open source and vendor-specific security, metadata management and governance frameworks. This approach gives organizations the visibility and administration over the data wrangling users are performing. Trifacta supports user hierarchies across roles determining data access and user functionality within the application. Administrators and data stewards are able to manage platform authentication and security at various user hierarchy levels
Trifacta maintains a robust connectivity and API framework enabling users to access live data without requiring them to pre-load or create a copy of the data separate from the source data system. This framework includes connecting to various Hadoop sources, Cloud services, Files (CSV, TXT, JSON, XML, etc.) and relational databases. All of these connectors support governance and security features – roles and permissions, SSL, Kerberos Auth (SSO) and impersonation.
Photon | Spark | …
Using Trifacta’s Intelligent Execution Engine, every transformation step defined in the user interface automatically compiles down into the best-fit processing framework based on data scale. Trifacta can transform the data on-the-fly in the application or compile down to Spark, Google DataFlow, or our in-memory engine, Photon. The platform natively supports all major Hadoop on-premise and cloud platforms. With this model, Trifacta can handle any scale.
Machine Learning | Transparent Lineage | Smart Cleaning
Trifacta learns from data registered into the platform and how users interact with it. Common tasks are automated and users are prompted with suggestions to speed their wrangling. The platform supports fuzzy matching, enabling end users to join data sets with non-exact matching attributes. Data registered in Trifacta are inferred to identify formats, data elements, schemas, relationships and metadata. The platform provides visibility into the context and lineage of data – both inside and outside of Trifacta
Recipes | User Defined Functions | Macros
Core to Trifacta’s differentiation is the platform’s Domain Specific Language Wrangle enabling users to abstract the data wrangling logic they’re creating in the application from the underlying data processing of that logic. Advanced users can create more complex wrangling tasks including window functions, user defined functions. Every step defined in Trifacta’s Wrangle language makes up a data preparation recipe or set of steps created in Trifacta that can be set into a repeatable pipeline.
Profile | Structure | Clean | Enrich | Validate
Trifacta leverages the latest techniques in data visualization, machine learning and human-computer interaction to guide users through the process of exploring and preparing data. Active Profiling presents guided visualizations of data based upon its content in the most compelling profile. Predictive Transformation converts every click or select within Trifacta into a prediction. Smart Cleaning empowers users to resolve common data quality issues like mismatched formats and unstandardized values.
Trifacta maintains a robust Publishing and Access framework. Outputs of wrangling jobs are able to be published to a variety of downstream file systems, databases, analytical tools, file and compression formats. Trifacta has deep API and bi-directional metadata sharing with a variety of analytics, data catalog and data governance applications. This enables users to share context and work between Trifacta and the external applications they’re leveraging through native integration.