11 captures
05 Oct 2009 - 27 Sep 2011
Aug SEP Oct
27
2010 2011 2012
success
fail
About this capture
COLLECTED BY
Organization: Internet Archive
The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine.

Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.

What's in the data set:

Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069

The seed list for this crawl was a list of Alexa's top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.

However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.

We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available "warts and all" for people to experiment with. We have also done some further analysis of the content.

If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you're hoping to do with it. We may not be able to say "yes" to all requests, since we're just figuring out whether this is a good idea, but everyone will be considered.

TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20110927150726/http://hub.opensolaris.org/bin/view/Project+ppc%2Ddev/faq
FAQ
en

FAQ

Solaris PowerPC Project - FAQ's

  1. What is the OpenSolaris on PowerPC project all about?
  • This is the first of what we expect will be many efforts with porting OpenSolaris onto the PowerPC, Power architectures and their associated hardware platforms. This project however is a specific effort that is targeted towards the 32bit PowerPC architecture.
  1. Why is there a specific project page, I thought there was already a PowerPC community?
  • Projects in OpenSolaris were created to allow specific developments a place to to be linked with an endorsing and perhaps multiple communities within OpenSolaris.org. This provides continuity with OpenSolaris in general. In the PowerPC, Power architecture we envision multiple projects ie: 32bit, 64bit, enterprise, embedded you name it.
  1. Why is this project only targeted at 32bit?
  • Pretty much have to walk before you can run. This project's focus is 32bit based in part that the original Solaris port to PowerPC was also 32bit. Unlike Sparc there are quite a number of 32bit based PowerPC versions alive and well in the industry with a rich roadmap. However 64bit has been a consideration from the start and this source base does not prohibit it.
  1. What is the current status of this contribution?
  • This release is best characterized as "work in progress", but enough functionality is there in order to allow hands on target development. There is a cross build environment where you can compile, download and run.
  • What the project is...
    • 32bit - G4 class processor
    • big endian ordering
    • single processor
    • statically linked unix/genunix
    • kernel boot up to vfs_mountroot(), fork and exec
    • cross compile environment on x86 host to PPC based target
    • debug with printf(), PMDB or Metrowerks PowerTAP or ABI2000 w/GDB
    • will run on Genesi ODW with 7457, EFIKA, PowerMac G4, MacMini
    • GCC dependent
    • fully open source, no closed binaries in PPC platform, arch and psm
  • What it is not (at the moment)
    • 64 bit
    • MP
    • legacy support for the 2.5.1 apps or targets
    • Sun Studio / Tools compatible
    • finished or complete
  1. So what does Sun Labs plan to do going forward after this contribution?
  • Sun Labs will continue to contribute to the code base with the goal of a single user prompt on the target.
  1. I can't find any of the 2.5.1 source in OpenSolaris, why?
  • Much has changed in Solaris since then and that source is extremely dated. Above all the amount of effort required to legally review the source in order to release it, was prohibitive, thus Sun chose not to include it in OpenSolaris. Additionally the relevant parts of that source have already been incorporated into the contribution.
  1. When is the next release of source?
  • The Sun Labs contribution to the source base is not on a periodic schedule, it is subject generally to the progress being made. With that said the projects svn repository ppc-dev/ppc-dev is being constantly updated. The next tarball release will see a big step forward in functionality. See the target output listed at the end of the Kickstart page for the latest on the kernel boot progress.
  1. How do I get an ODW target machine?
  • Unfortunately the Genesi Pegasos ODW has been discontinued. You can find the EFIKA board, a more embedded type of PPC platform here. Power Macs can also serve a low cost target since many are available, second hand.

Tags:
Created by admin on 2009/10/26 12:17
Last modified by admin on 2009/10/26 12:17

XWiki Enterprise 2.7.1.34853 - Documentation