skip to main navigation
open

80legs Web-Scale Apps Competition

What can you do with the power of over 50,000 computers?
Posted by 80legs

Solved by:

Chen Wang, 8 months ago
Diederik van Liere, 8 months ago
Shawn Simister, 8 months ago

As the volume of web content grows, the ability to usefully find, understand and utilize that information becomes exponentially more difficult. While fascinating technologies and applications exist for processing text, images, video and other web media, until now they have not been able to monetize their value.

80legs, the web-scale crawling service, is providing a channel for developers of these applications to monetize their technology.  Later this Fall, we will be launching the 80apps Store, which will allow developers to sell web content processing applications directly to 80legs users.

We are issuing a challenge to developers from around the world to build the best 80legs application.

Examples of possible 80Apps may be:

  • Semantic analysis or natural language processing of text content
  • Fingerprinting various types of web media
  • Converting any sort of unstructured content into structured data 

The potential applications are limitless, though. We are interested in your ideas for programs that can best harness 80legs’ ability to crawl and process up 2 billion web pages per day. 

Additionally, all submitted apps will be offered through the 80legs App Store.  When the store launches, 80legs users will be able to buy and use apps developed by contestants.  80App developers can set their own prices and keep 100% of the revenue earned.

Three levels of prizes will be rewarded for the “Best App.”

1st prize will be the winner’s choice of the following:

2nd prize will be the winner’s choice of the following:

3rd prize will be the winner’s choice of the following:

Winners will be decided by a 5 person panel of judges that include:

  • Nova Spivack, Founder and CEO, Twine and Radar Networks
  • Carla Thompson, Senior Analyst, Guidewire Group
  • Shion Deysarkar, CEO, 80legs
  • Michael Cote, Industry Analyst, Red Monk
  • Eric Ries, Blogger, StartupLessonsLearned.com

All applications must be submitted by December 11th.  Winners will be announced December 15th. (Dates are subject to change.)

SOLUTION REQUIREMENTS

Contestants should submit a text description of their 80App to ChallengePost.

In addition to the ChallengePost submission, contestants should create accounts at http://portal.80legs.com and upload their submissions under the Code section.

Please name your 80App using the format "Contest_Submision_xxxx".

For information on how to create an 80App, see http://80legs.pbworks.com/80Apps. This page outlines specific requirements for 80Apps.

SOLUTION DEADLINE

December 11, 2009

Accepted solution
A new solution.

(well not really sure if this is the place to submit competition entry) Contest_Submision_dabsused is a crawler made for one of my pet projects dabsused.com which tracks price changes of used stocks on several websites run by a popular UK e-commerce company, Dabs. I wrote existing crawler in Java a few months ago and it's been in production since; This 80App is to replicate what the crawler does for checking prices. It's not a general purpose crawler, rather it's designed specifically for currently layout of those websites, which share a common backend system. I managed to get very similar, definitely usable results from creating a 80App crawler but it's not plain sailing either. I've put together a few slides to express my opinions (see attachment) and hopefully that's going to help someone or 80legs :) The crawler project's source is available for download too. 

view uploaded document
A new solution.

Companies are increasingly aware that they need to monitor the blogosphere to see what is being said about their brands and products. Given the fact that there are too many blogs to follow individually, which blogs are important to monitor? Obviously, looking at the number of incoming links is a known metric for impact but we need additional measures to determine who is influential in the blogosphere and who is not.

Blogcrawler is an 80legs.com hosted application to measure the community strength of a blog. Community strength is defined as the number of returning commenters divided by the total number of commenters and is scaled by the number of posts of the blog and total number of comments. This measure is best used when comparing multiple blogs and traffic stats for these blogs are unknown. Returning commenters are an indication that a blogger has the capability to attract and retain an audience.

Blogcrawler is a generic tool for blogs as it supports generic parsers for most blogs and can be extended with custom parsers if needed to crawl a specific blog.

After the crawler has crawled all required blogs, then a simple Python script is used to parse the results and calculate community strength for each blog. Community strength can be calculated for different time windows: last week, last month or even last year.

Detailed output is also possible, how many articles were posted in a given time window, how many comments were made and how many unique commenters participated.

The default parsers are configured for a number of Canadian political blogs including: eaves.ca, terahertzatheist.ca, djkelly.ca and stageleft.info.

You can reach me at @drdee_is_wired

view uploaded document
A new solution.

AdverSpider is a next-generation web crawler that specializes in detecting and analyzing online advertising. AdverSpider takes a list of URLs, visits each page and figures out how many ads are being displayed, which ad networks are being used and which sizes of ads are available. This data is returned to you as XML which makes it easy to process however you'd like.

For more details see: http://shawn.simister.ca/AdverSpider/

0 other solution(s)

  • Was an accepted solution announced? I don't see this as solved but the deadline has passed.

    Question from Daniel Goodwin 8 months, 2 weeks ago

Related Challenges

1 ORGANIZATION
WAS HELPED
$6,500 In
Great Prizes
WAS
AWARDED

This organization wants this solved