2011년 1월 4일 화요일

Nutch 0104

Today try to summarize the Nutch wiki.

visit the site : http://wiki.apache.org/nutch/

Above site has below contents.

General Information
  • Features
  • Public Servers running Nutch
  • Lucene
Nutch Administration
  • Downloading Nutch
  • Hardware Requirements
  • Nutch Tutorial
  • Java Search Engine(builds on the basic tutorials)
  • Nutch Hadoop tutorial
  • Automating Fetches with Python
  • Upgrading hadoop version in Nutch
  • commandline options for 0.7.x
  • Current commandline options
  • Overview Deployment configs
  • Nutch Configuration files
  • getting nutch running with utf-8(korean,chinese,japanese)
  • getting nutch running with resin - resin is a JSP/Servlet/EJB application server(alternative to tomcat)
  • getting nutch running with Jboss
  • getting nutch running with Ubuntu
  • setupProxyfor Nutch
  • create New Filter
  • run nutch in eclipse
  • crawl - script to crawl
  • intranet recrawl
  • merge crawl
  • search over multiple indexes
  • cross platform nutch scripts
  • monitoring nutch crawls
  • nutch 0.9 crawl script tutorial
  • http authentication schemes
  • optimizing nutch

Nutch Development

  • becoming a nutch developer
  • plugin central
  • internal documentation
  • multi language support
  • how to contribute
  • development
  • committer's rules
  • release howto
  • website update howto
  • image search design
  • nutch osgi
  • strategic goals
  • getting started
  • java demo application
  • installing web2

Nutch 2.0

  • Nutch 2 roadmap
  • nutch 2 architecture
  • new scoring

Other resources

  • Doug weblog
  • Search theory