« Java 7: Five days is just not enough time | Main | Connecting Google to SharePoint 2010: White Paper »

August 01, 2011

Google Refine, Google's open source ETL tool for data cleansing, with videos!

For any of you working with Entity Extraction this might be of interest.  Google has open sourced some software from their FreeBase acquisition, formerly called Gridworks.  It lets you interactively cleanup and transform data.  More importantly, it says these steps into a reusable sequence of steps in JSON format, so they could be reapplied to other data.

Here's the main page and wiki (and 3 intro videos):

It IS Open Source, here's the source code and license:

That type of UI makes me want to dust off our XPump code and retrofit into it...

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef014e8b099bf3970d

Listed below are links to weblogs that reference Google Refine, Google's open source ETL tool for data cleansing, with videos!:

Comments

The comments to this entry are closed.