visit the site : http://wiki.apache.org/nutch/
Above site has below contents.
General Information
- Features
- Public Servers running Nutch
- Lucene
- Downloading Nutch
- Hardware Requirements
- Nutch Tutorial
- Java Search Engine(builds on the basic tutorials)
- Nutch Hadoop tutorial
- Automating Fetches with Python
- Upgrading hadoop version in Nutch
- commandline options for 0.7.x
- Current commandline options
- Overview Deployment configs
- Nutch Configuration files
- getting nutch running with utf-8(korean,chinese,japanese)
- getting nutch running with resin - resin is a JSP/Servlet/EJB application server(alternative to tomcat)
- getting nutch running with Jboss
- getting nutch running with Ubuntu
- setupProxyfor Nutch
- create New Filter
- run nutch in eclipse
- crawl - script to crawl
- intranet recrawl
- merge crawl
- search over multiple indexes
- cross platform nutch scripts
- monitoring nutch crawls
- nutch 0.9 crawl script tutorial
- http authentication schemes
- optimizing nutch
Nutch Development
- becoming a nutch developer
- plugin central
- internal documentation
- multi language support
- how to contribute
- development
- committer's rules
- release howto
- website update howto
- image search design
- nutch osgi
- strategic goals
- getting started
- java demo application
- installing web2
Nutch 2.0
- Nutch 2 roadmap
- nutch 2 architecture
- new scoring
Other resources
- Doug weblog
- Search theory