« Java 7: Five days is just not enough time | Main | Connecting Google to SharePoint 2010: White Paper »

August 01, 2011

Google Refine, Google's open source ETL tool for data cleansing, with videos!

For any of you working with Entity Extraction this might be of interest.  Google has open sourced some software from their FreeBase acquisition, formerly called Gridworks.  It lets you interactively cleanup and transform data.  More importantly, it says these steps into a reusable sequence of steps in JSON format, so they could be reapplied to other data.

Here's the main page and wiki (and 3 intro videos):

It IS Open Source, here's the source code and license:

That type of UI makes me want to dust off our XPump code and retrofit into it...

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef014e8b099bf3970d

Listed below are links to weblogs that reference Google Refine, Google's open source ETL tool for data cleansing, with videos!:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.