Google Refine, Google's open source ETL tool for data cleansing, with videos!
For any of you working with Entity Extraction this might be of interest. Google has open sourced some software from their FreeBase acquisition, formerly called Gridworks. It lets you interactively cleanup and transform data. More importantly, it says these steps into a reusable sequence of steps in JSON format, so they could be reapplied to other data.
Here's the main page and wiki (and 3 intro videos):
- http://code.google.com/p/google-refine/
- http://code.google.com/p/google-refine/wiki/GettingStarted?tm=6
It IS Open Source, here's the source code and license:
- http://code.google.com/p/google-refine/source/checkout
- http://www.opensource.org/licenses/bsd-license.php
That type of UI makes me want to dust off our XPump code and retrofit into it...
Comments