Amazon has unveiled the launch of AWS Glue DataBrew, a new feature that will enable users to extract, transmit, and load data to get it ready for analysis without having to write code.
Since its launch in 2016, data engineers have used AWS Glue to create, run, monitor, extract, transform, and load jobs. Glue has been a visual tool for engineers to do ETL with some coding involved. AWS Glue provides both code-based and visual interfaces, and has dramatically simplified extracting, orchestrating, and loading data in the cloud for customers.
What makes DataBrew different is that it’s created for analysts and data scientists to work on the same data cleansing operation by simply clicking buttons and checking off radial boxes in a visual user interface.
For DataBrew, AWS is offering more than 250 pre-built functions to automate data preparation tasks that otherwise would require days to weeks to code, according to the company. Additionally, DataBrew helps data scientists and data analysts get the data ready for analytics and machine learning 80% faster than traditional data preparation approaches.
“AWS customers are using data for analytics and machine learning at an unprecedented pace. However, these customers regularly tell us that their teams spend too much time on the undifferentiated, repetitive, and mundane tasks associated with data preparation,” said Raju Gulabani, Vice President of Database and Analytics at AWS. “AWS Glue DataBrew features an easy-to-use visual interface that helps data analysts and data scientists of all technical levels understand, combine, clean, and transform data.”
DataBrew works with any CSV, Parquet, JSON, or .XLSX data housed in S3, Redshift, and the Relational Database Service (RDS), or any other AWS data store that is accessible from a JDBC connector. According to AWS, any data that is indexed by the AWS Glue Data Catalog can also be brought into DataBrew.
AWS posted a video demonstration on YouTube that illustrates how DataBrew can remove special characters such as an ampersand in a database entry, as these can’t be used in data analysis. In another example, DataBrew maps a text-string to numeric values to make it possible to analyze those entries using a categorical mapping function. There’s also a profiling function in DataBrew that provides useful information such as the number of missing entries in each data set.
Amazon says it already has customers using the software, including Japanese telecom firm NTT DoCoMo and energy giant BP.