28 captures
16 May 2011 - 30 Dec 2025
Apr MAY Jun
16
2010 2011 2012
success
fail
About this capture
COLLECTED BY
Organization: Internet Archive
The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine.

Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.

What's in the data set:

Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069

The seed list for this crawl was a list of Alexa's top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.

However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.

We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available "warts and all" for people to experiment with. We have also done some further analysis of the content.

If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you're hoping to do with it. We may not be able to say "yes" to all requests, since we're just figuring out whether this is a good idea, but everyone will be considered.

TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20110516090533/http://www.blinkx.com/article/woomi-incorporates-blinkx-video-content-ott-offering~1519
Over 35 million hours of

video

. Search it all.
Go Wall it!
Safe Search is ON
  • Browse blinkx
Press Releases
Coverage
Awards
Virtual Press Kit

woomi Incorporates blinkx Video Content in OTT Offering >>

15 February 2011
Miniweb's Video Distribution Platform Will Offer blinkx Content on Millions of Connected Devices Worldwide

SAN FRANCISCO, CALIF.--February 15, 2011--blinkx, the world's largest and most advanced video search engine, today announced a partnership with woomi, the new connected TV destination from Miniweb Technologies, the leading cloud-based video distribution platform. Under the terms of the agreement, audiences worldwide will have access to blinkx's index of over 35 million hours of online video through the woomi service.

woomi enables device manufacturers to enhance their products with a feature-rich Connected TV service that offers a wide variety of online video, including content from blinkx's extensive index. The woomi system is enhanced with user-friendly functions such as search, discovery, single-click payments and personalization. Now live across all Samsung smart TVs in the UK, woomi will be rolling out to multiple territories over 2011.

"We're thrilled to extend our partnership with Miniweb through our collaboration with woomi," said Suranga Chandratillake, founder and CEO, blinkx. "woomi gives users access to a diverse array of online video from multiple content partners within a single application, and optimizes the Connected TV viewing experience for users with advanced functionality like discovery, recommendation and related content - truly fusing the best of broadband and broadcast entertainment."

"We're delighted to be able to incorporate blinkx's unparalleled video index into the woomi offering," said Jerome de Vitry, CEO, Miniweb. "With compelling content partners like blinkx, woomi helps device manufacturers offer consumers a complete Connected TV solution."

blinkx also recently launched its TV API (Application Programming Interface) designed to provide partners in the fast-growing Connected TV ecosystem - from box makers and TV manufacturers, to app developers and game consoles - access to blinkx's index of over 35 million hours of online video.

About blinkx

blinkx plc (LSE AIM: BLNX) is the world's largest and most advanced video search engine. Today, blinkx has indexed more than 35 million hours of audio, video, viral and TV content, and made it fully searchable and available on demand. blinkx's founders set out to solve a significant challenge - as TV and user-generated content on the Web explode, keyword-based search technologies only scratch the surface. blinkx's patented search technologies listen to - and even see - the Web, helping users enjoy a breadth and accuracy of search results not available elsewhere. In addition, blinkx powers the video search for many of the world's most frequented sites. blinkx is based in San Francisco and London. More information is available at www.blinkx.com.

About Miniweb

Miniweb's leading cloud-based video distribution platform for connected TVs (www.miniweb.tv) enables a revolutionary broadband entertainment experience on the TV. With easy navigation and consumption of Internet video, Miniweb's ground-breaking platform provides a seamless experience across hundreds of video publishers to empower the viewer with search, recommendations, micro-payments, community and personalization functions.

Miniweb's consumer brand, woomi (www.woomi.tv) is the first combined micro-payment and advertising supported content marketplace for connected TV devices. Woomi enables video publishers to maximize their revenue opportunities by dramatically extending their audience reach globally across any kind of connected TV device. Woomi gives TV device manufacturers and operators a wide array of content from hundreds of video publishers within a single app, allowing them a broader and richer video proposition and additional revenue streams.

Miniweb's Connected TV Services Platform has garnered substantial industry acclaim including TVB Europe Editors' Award Best of IBC 2009, Best Interactive Service/Application at IPTV World Series Award 2009, Pick Hit Award for Innovation at IBC 2008 and the Red Herring Europe 100 Winner 2008.

Press Contacts for blinkx

Tim Turpin
Sparkpr
+1 (415) 321 1894
tim.turpin@sparkpr.com

Nicole Love
Marlin PR
+44 207 869 8328
nicole.love@marlinpr.com

Charles Lytle
Christopher Wren
Citigroup Global Markets Ltd
NOMAD and Broker for blinkx plc
+44 207 986 4000

AboutNewsAdvertisersInvestorsPartnersProducts & SolutionsContact (c) 2011 blinkx