HouseSpider | Homepage | Usage | »Project | »Download |
---|
Developed by Keith L. Jackson - up to and including version 4.0 - and later by Hans Fr. Nordhaug. This version was released 14th of January 2003.
The HouseSpider applet is in a jar that must be copied to the same host as your web site. In addition you need to copy one (or all) of the button jars to the same location. Make sure that you copy the file as binary format. A sample HTML tag is as follows:
<applet code="HouseSpider.class" archive="HouseSpider.jar,buttons.bevel.jar" width="90%" height="200"> <param name="URLStart" value="http://housespider.sourceforge.net/index.html"> <param name="URLExclude" value="http://housespider.sourceforge.net/doc/ver40/"> <param name="URLHelp" value="http://housespider.sourceforge.net/doc/ver44/"> <param name="bgcolour" value="FFFFFF"> <param name="fgcolour" value="666666"> <param name="bgtextcolour" value="FFFFFF"> <param name="textcolour" value="666666"> </applet>
The code="HouseSpider.class" is required identify the applet to run. The applet must be named "HouseSpider.class" (case matters), it will not work if its name is altered. The archive="HouseSpider.jar" is required and may include a path if you are calling the applet from a different directory The width and height should be set to a minimum of 450 * 200, I prefer width="90%" and height="200" or more. The applet tag may be followed by these optional parameters:
Now test your site and see how it works!
HouseSpider will by default use cache-searching, i.e, search the index-file. If it can't find the index-file (compressed or uncompressed) it will do a spider-search.
I will illustrate the usage of URLExclude and FileExclude by some examples. URLStart points to www.yourserver.com/somedir/start.html which contain the following links:
www.yourserver.com/somedir/dir1/page1.shtml www.yourserver.com/somedir/dir1/page2.html www.yourserver.com/somedir/dir2/page2.html www.yourserver.com/somedir/dir2/page3.shtml
<param name="URLExclude" value="http://www.yourserver.com/somedir/dir1/page1.shtml, http://www.yourserver.com/somedir/dir2/page3.shtml">
<param name="URLExclude" value="http://www.yourserver.com/somedir/dir1/">
<param name="FileExclude" value="page2.html">
<param name="FileExclude" value=".shtml">
<param name="FileExclude" value="page2.html,.shtml">or
<param name="URLExclude" value="http://www.yourserver.com/somedir/dir1/, http://www.yourserver.com/somedir/dir2/">
You need to invoke the HouseSpider applet from a command line using appletviewer, something like "appletviewer indexsetup.html". (The appletviewer is part of the Java Runtime Environment.) In theory it should be possible to index from within the web browser, but because of the security restrictions on Java applets in web browsers it may not work. The file "indexsetup.html" should contain the following tags:
<html> <body> <applet code="HouseSpider.class" codebase="/local/www/dir/jars/" archive="HouseSpider.jar,buttons.bevel.jar" width="400" height="200"> <param name="URLStart" value="http://your.web.server/dir/startfile"> <param name="URLExclude" value="http://your.web.server/dir/somedir/"> <param name="SaveDir" value="/local/www/dir/"> </applet> </body> </html>
where "SaveDir" tells where you want to save the index file "HouseSpider.index". "codebase" tells where the jar files are stored. The URL, "URLStart", may be to a local copy or remote copy of your site, but you will probably want to index a local copy of your site before you upload it to the server. Note that "URLExclude" is optional.
Type "houseindex" in the text input box to index your site. You may need to set the security mode for the appletviewer to unrestricted. Try the following settings:
acl.read=/local/www/dir/ acl.write=/local/www/dir/ appletviewer.security.mode=unrestricted
grant { permission java.net.SocketPermission "*", "accept, connect, listen, resolve"; permission java.io.FilePermission "*", "read, write, delete, execute"; };
If all goes well, two files will be generated: "HouseSpider.index" and "HouseSpider.log". Put these files in the "URLStart" directory, or in the same location as the applet page if "URLStart" is undefined. You may edit the log file, but keep all the text on one line. The contents of the log file is displayed in the status field of the applet at start up.
After indexing, a statistics report is generated that lists how many pages that where indexed and pages with missing title tags.
HouseSpider is capable of indexing 10000 web pages and 10000 key words. Memory is not the issue, but size of the index file is. I figure that an index file of 100k will still be acceptable, but anything larger will probably be too slow in loading.
To generate a compressed index file type "houseindex-zip" in the text input box (in stead of "houseindex"). If all goes well, one file will be generated, "HouseSpider.index.zip". You may of course also zip an existing index file (with your favourite compressing tool).
HouseSpider will automatically use a compressed index file if it exists.
HouseSpider is open source and being distributed under the GNU General Public license. If you decide to modify HouseSpider, and your modifications are improvements (as opposed to simple GUI modifications) please submit a patch on the SourceForge project page. Source code for this and later versions are found on the SourceForge project page.
I thank Tim Tyler for the ImageLoader class.