2 posts categorized "Crowd Sourcing"

September 06, 2012

Got OGP? A Social Media Lesson for the Enterprise

    Anytime you decide to re-post that article hot off the virtual press from a site like nyt.com or Endgadget to your social network of choice, odds are strong that its content crosses the news-media-to-social-media gap via a metadata standard called the Open Graph Protocol, or OGP.  OGP facilitates grabbing the article's title, its content-type, an image that will appear in the article's post on your profile, and the article's canonical URL.  It's a simple standard based on the usual HTML metadata tags that actually predate Facebook and Google+ by over a decade (OGP's metadata tags can be distinguished  by the "og:" prefix on each property name, e.g. "og:title", "og:description", etc.)  And despite its Facebook origins, OGP's success should strongly inform enterprise metadata policies and practices in one basic, crucial area.

    The key to OGP's success on the public internet lies largely in its simplicity.  Implementing OGP requires the content creator to fill in just the four aforementioned metadata fields:

  • the content's URL (og:url)
  • its title (og:title)
  • its type (og:type)
  • a representative image (og:image)

     A great number of other OGP metadata fields certainly do exist, and should absolutely be taken advantage of, but only these four need to be defined in order for a page to be considered OGP-compliant.

     What can we immediately learn here from OGP that applies to metadata in the enterprise?  The enterprise content-creation and/or content-import process should involve a clearly-defined and standardized minimum set of metadata fields that should be present in every document *before* that document is added into CMS and/or indexed for search.  NYT.com certainly doesn't push out articles without proper OGP, and enterprise knowledge workers need to be equally diligent in producing documents with the proper metadata in place to find them again later!  Even if practical complications make that last proposition difficult, many Content Management Systems can be setup to suggest a basic set of default values automagically for the author to review at submission time.  Just having a simple, minimum spec in place for the metadata fields that are considered absolutely mandatory will generally improve base-line metadata quality considerably.

    What should this minimum set of metadata fields include for your specific enterprise content? It's hard to make exact recommendations, but let's consider the problem that OGP's designers were trying to solve in the case of web-content metadata: people want a simple preview of the content they're sharing from some content source, with sufficient information to identify that content's basic subject-matter and providence, and (perhaps most importantly!) a flashy image that stands out on their profile.  OGP's four basic requirements fit exactly these specs.  What information do your knowledge workers always need from their documents?  Perhaps the date-of-creation is a particularly relevant data-point for the work they're doing, or perhaps they often need to reference a document's author.  Whatever these fields might actually be, spending some time with the people who end up using your enterprise documents' metadata is the best way to find out.  And even if their baseline needs are dead simple, like the problem OGP manages to solve so succinctly, your default policy should be to just say NO to no metadata.  Your search engine will thank you.

    A natural question might arise from this case-study: should you actually just start using OGP in the enterprise?  It's not necessarily the best option, since intranet search-engine spiders and indexers might not know about OGP fields yet.  In any case, you'll definitely still want to have a regular title, description, etc. in your documents as well.  As of the time-of-writing, OGP is still best suited to the exact niche it was desinged to operate in: the public internet.  Replicating the benefits it provides within the enterprise environment is an important goal.

February 21, 2012

10 changes Wikipedia needs to become more Human and Search Engine Friendly

There's a really nice set of examples comparing JSON to other similar formats like YAML, Python, PHP, PLists, etc.  It was in a Wikipedia article, but you won't see it now unless you know to go looking through the version history (link in previous sentence).

Contents-deletedThis the content had existed for quite a while in that article, and had been contributed to by many people.  One day in March 2011 one editor decided it was irrelevant and gutted that entire section.  The information was useful, I was actually looking for it today!  I happened to think of reviewing the version history since I was almost sure that's where it had been.

The editors at Wikipedia need to be able to delete content, for any number of reasons, and I'm sure it's a thankless job.  And there are procedures for handling disputed edits - I've pinged that editor who deleted it about maybe finding a new home for the content.  Also, ironically, I found an out of date copy of the page here that still has it.

I'm not in favor of a ton of rules, but I beleive wholesale deletes of long-existing and thriving content should get some special attention.  To be clear, I'm not talking about content that's clearly wrong or profane or whatever, or gibberish that was just added.  How about as a first pass "a paragraph or more that has existed for more than 6 months (long-lived) and has been contibuted to (not just corrected) byt at least 3 people (thriving".

Human-Centric Policy & Tech Changes:

  • If the content's only "crime" is that it's somewhat off-topic then the person doing the deleting ought to find another home for it.  The editor could either move it to another page, or possibly even split up the current page; maybe they could "fork" the page into 2 pages, then cross link them, and then remove duplicate content, so then page 1 retains the original article and links to page 2, and then page 2 has the possibly-off-topic content. Yes, this would take more effort for the "deleteing" editor, BUT what about the large amount of effort the mutliple contributors put into it, and going the extra step to try and conform to Wikipedia's policies so that it would NOT get deleted.  I also suspect that senior editors, those more likely to consider wholesale deletes, are probably much more efficient as splitting up a page or moving content somwhere else - novice contributors might be unaware, or only vaguely aware, that such thigns are even possible.
  • Wikipedia should make it easier for contributors to find content of theirs that's been deleted.  this is a somewhat manual process now.  Obviously they don't want to promote "edit wars".
  • Wikipedia should generally track large deletes (maybe they do?)
  • Wikipedia should "speed up" it's diff viewer.  It should run faster so you can zip through a bunch of changes, and maybe even include "diff thumbnails".  The UI makes sense to developers used to source code control systems, but is probably confusing to most others.  I realize this is all easier said than done!
  • Wikipedia should include some visual indication of "page flux".  It would helpful, for example, if a young person could see at a glance that abortion and gun control are highly debated subjects between adults.
  • Wikipedia should be a bit more visually proactive in educating visitors that there are other versions of a page.  I'm sure Wikipedia veterans would say "more visible!? - there's a bold link in a tab at the top of every page!"  While that's true, it just doesn't convey it to casual visitors.  On the other hand, it shouldn't go too far and be annoying about it - like car GPS systems that make you agree to to the legal disclaimer page every time you get in the car!

Search Engine Related Changes:

  • Wikipedia search (I use the Google option) should have an option to expand the scope of search to include deleted content.  This shouldn't be the default, and there are presentation issues to be considered.  Some deletes are in the middle of sentences, and there are multiple deletes and edits, etc, so I realize it's not quite as easy as it may sound.
  • There needs to be a better way to convey this additional data to public search engines, as well as representing it in their own built in engine.
  • Wikipedia should consider rendering full content and all changes inline, using some type of CSS / HTML5 dynamic mode that marks suspect or deprecated content with tags, instead of removing it.  Perhaps the search engines could also request this special version of the page and assign relevance accordingly.
  • Perhaps Wikipedia could offer some alternative domain name for this somewhat messier version of the data, something like "garaga.en.wikipedia.org".

It's Not Just A or B:

  • Whenever I hear people lament the declining content contributions on Wikipedia I have to chuckle.  It's incredibly demoralizing to delete content that people take the time to contribute.  If a new contributor on Wikipedia discovers 1 or 2 of their first edits promptly deleted, trust me they're very unlikely to try again.  I know a number of people that have just given up.
  • Others would say that if you put more pressure on editors to not delete, then the overall quality of Wikipedia will go down, and raving nut-jobs and spammers will engulf the site.
  • The compromise is to flag content (which is very similar to tracking diffs) and give users and search engines some choice in whether they want to see "suspect" content or not.

This is about survival and participation.  When newer contributors have their content "flagged" vs. "deleted", with more explanations and recourse, they will still learn to uphold Wikipedia's quality standards without being too discouraged.  They'll hang around longer and practice more.

An analogy: Wikipedia's current policy is like potty training your kid with a stun gun - make one mistake and ZAP! - or don't bother and just keep going in your diaper like you've always done.

I understand and appreciate all the work that Wikipedia's volunteers do, but I think there are some constructive things that could be done better.