Trifacta accelerates data cleaning & preparation with a
modern platform for cloud data lakes & warehouses

Ensure the success of your analytics, ML & data onboarding initiatives across any cloud, hybrid and multi-cloud environment

 

trifacta logo

What is Data Wrangling?

Successful analysis relies upon accurate, well-structured data that has been formatted for the specific needs of the task at hand. Yet, today’s data is bigger and more complex than ever before. It’s time-consuming and technically challenging to wrangle it into a format for analysis. Data wrangling is the process you must undergo to transition raw data source inputs into prepared outputs to be utilized in analysis and various other business purposes.

An Intelligent Platform that Interoperates with Your Data Investments

Trifacta sits between the data storage and processing environments and the visualization, statistical or machine learning tools used downstream. The platform is architected to be open and adaptable so as the technologies upstream and downstream change, the investments and logic created in Trifacta are able to utilize those innovations.

Collaborative Data Goverance refers to features within Trifacta that provide extensive support for open source and vendor-specific security, metadata management and governance frameworks. This approach gives organizations the visibility and administration over the data wrangling users are performing. Trifacta supports user hierarchies across roles determining data access and user functionality within the application. Administrators and data stewards are able to manage platform authentication and security at various user hierarchy levels

Trifacta provides end-to-end secure data access and clear auditability that comply with the stringent requirements of enterprise IT. The platform provides support for encryption, authentication, access control and masking. Trifacta’s differentiated approach to security focuses on providing enterprise functionality (such as SSO, impersonation, roles and permissions) while balancing extensive security framework integration with existing policies. Customers can integrate Trifacta into what’s already working for them without having to support a separate security policy.
Within Trifacta, users can share reusable data preparation logic and dataset relationships, which lets them leverage and build upon each other’s efforts. Multiple users can contribute to a single project, which parallelizes workflows, allows different degrees of participation, and speeds up time to completion. Datasets and data preparation steps can also be integrated with 3rd party applications through Trifacta’s API. Additionally, preparation steps can be exported and shared outside Trifacta.
Trifacta’s operationalization features introduce the ability for data analysts to schedule and monitor workflows that run jobs at scale in production, while still providing the traceability and access control for IT. Every data preparation recipe or set of steps created in Trifacta can be set into a repeatable pipeline according to hourly, daily, weekly schedules or the time period defined by the user. Individual recipes can makeup broader pipelines that make up multiple datasets and recipes.

Trifacta maintains a robust connectivity and API framework enabling users to access live data without requiring them to pre-load or create a copy of the data separate from the source data system. This framework includes connecting to various Hadoop sources, Cloud services, Files (CSV, TXT, JSON, XML, etc.) and relational databases. All of these connectors support governance and security features – roles and permissions, SSL, Kerberos Auth (SSO) and impersonation.

Trifacta has support for enriching data with geographic, demographic, census and other common types of reference data. Common taxonomies and ontologies are automatically recognized such as geographic and time-based content as well as data format taxonomies for nested data structures like JSON and XML. The platform is also open/extensible through APIs giving customers and partners the ability to seamlessly integrate additional data sources and targets.

Photon | Spark |

Using Trifacta’s Intelligent Execution Engine, every transformation step defined in the user interface automatically compiles down into the best-fit processing framework based on data scale. Trifacta can transform the data on-the-fly in the application or compile down to Spark, Google DataFlow, or our in-memory engine, Photon. The platform natively supports all major Hadoop on-premise and cloud platforms. With this model, Trifacta can handle any scale.

Machine Learning | Transparent Lineage | Smart Cleaning

Trifacta learns from data registered into the platform and how users interact with it. Common tasks are automated and users are prompted with suggestions to speed their wrangling. The platform supports fuzzy matching, enabling end users to join data sets with non-exact matching attributes. Data registered in Trifacta are inferred to identify formats, data elements, schemas, relationships and metadata. The platform provides visibility into the context and lineage of data – both inside and outside of Trifacta

Recipes | User Defined Functions | Macros

Core to Trifacta’s differentiation is the platform’s Domain Specific Language Wrangle enabling users to abstract the data wrangling logic they’re creating in the application from the underlying data processing of that logic. Advanced users can create more complex wrangling tasks including window functions, user defined functions. Every step defined in Trifacta’s Wrangle language makes up a data preparation recipe or set of steps created in Trifacta that can be set into a repeatable pipeline.

Profile | Structure | Clean | Enrich | Validate

Trifacta leverages the latest techniques in data visualization, machine learning and human-computer interaction to guide users through the process of exploring and preparing data. Active Profiling presents guided visualizations of data based upon its content in the most compelling profile. Predictive Transformation converts every click or select within Trifacta into a prediction. Smart Cleaning empowers users to resolve common data quality issues like mismatched formats and unstandardized values.

Trifacta maintains a robust Publishing and Access framework. Outputs of wrangling jobs are able to be published to a variety of downstream file systems, databases, analytical tools, file and compression formats. Trifacta has deep API and bi-directional metadata sharing with a variety of analytics, data catalog and data governance applications. This enables users to share context and work between Trifacta and the external applications they’re leveraging through native integration.

Contact us to learn more about Trifacta

trifacta logo
person outline with computer outline