webalizer-2.01-10-src

所属分类:Web服务器
开发工具:Unix_Linux
文件大小:418KB
下载次数:14
上传日期:2009-05-06 23:06:04
上 传 者ruijing
说明:  统计日志的工具,webalizer分析,以学习到很多编程的方法,看好的源代码,对于提高自己的编程水平,比自己写源代码的帮助更大。
(Statistics log tools, webalizer analysis to learn a lot of programming methods, the source code good for raising the level of their programming, than to write the source code to help more.)

文件列表:
webalizer-2.01-10-src\webalizer-2.01-10\aclocal.m4 (546, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\CHANGES (18939, 2002-04-17)
webalizer-2.01-10-src\webalizer-2.01-10\configure (68730, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\configure.in (6688, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\COPYING (17990, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\Copyright (1323, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\country-codes.txt (4111, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\dns_resolv.c (25025, 2002-04-17)
webalizer-2.01-10-src\webalizer-2.01-10\dns_resolv.h (1295, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\graphs.c (25669, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\graphs.h (444, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\hashtab.c (29558, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\hashtab.h (4715, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\INSTALL (9926, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\install-sh (5585, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.catalan (34177, 2000-10-19)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.chinese (32321, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.croatian (32616, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.czech (33290, 2002-04-17)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.danish (32888, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.dutch (35411, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.english (32670, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.estonian (32574, 2001-07-05)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.finnish (33273, 2001-02-10)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.french (34883, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.galician (33252, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.german (35538, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.greek (32700, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.hungarian (33648, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.icelandic (33050, 2000-12-17)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.indonesian (35043, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.italian (34694, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.japanese (32321, 2001-10-23)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.korean (32076, 2000-10-20)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.latvian (33024, 2000-10-19)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.malay (34109, 2000-10-31)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.norwegian (33808, 2000-11-19)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.polish (33972, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.portuguese (33554, 2002-04-17)
... ...

The Webalizer - A web server log file analysis tool Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net) Distributed under the GNU GPL. See the files "COPYING" and "Copyright" supplied with the distribution for additional info. What is The Webalizer? ---------------------- The Webalizer is a web server log file analysis program which produces usage statistics in HTML format for viewing with a browser. The results are presented in both columnar and graphical format, which facilitates interpretation. Yearly, monthly, daily and hourly usage statistics are presented, along with the ability to display usage by site, URL, referrer, user agent (browser), search string, entry/exit page, username and country (some information is only available if supported and present in the log files being processed). Processed data may also be exported into most database and spreadsheet programs that support tab delimited data formats. The Webalizer supports CLF (common log format) log files, as well as Combined log formats as defined by NCSA and others, and variations of these which it attempts to handle intelligently. In addition, wu-ftpd xferlog formatted logs and squid proxy logs are supported. Gzip compressed logs may now be used as input directly. Any log filename that ends with a '.gz' extension will be assumed to be in gzip format and uncompressed on the fly as it is being read. In addition, the Webalizer also supports DNS lookup capabilities if enabled at compile time. See the file DNS.README for additional information. This documentation applies to The Webalizer Version 2.01 Running the Webalizer --------------------- The Webalizer was designed to be run from a Unix command line prompt or as a cron job. There are several command line options which will modify the results it produces, and configuration files can be used as well. The format of the command line is: webalizer [options ...] [log-file] Where 'options' can be one or more of the supported command line switches described below. 'log-file' is the name of the log file to process (see below for more detailed information). If a dash ("-") is specified for the log-file name, STDIN will be used. Once executed, the general flow of the program follows: o A default configuration file is scanned for. A file named 'webalizer.conf' is searched for in the current directory, and if found, it's configuration data is parsed. If the file is not present in the current directory, the file '/etc/webalizer.conf' is searched for and, if found, is used instead. o Any command line arguments given to the program are parsed. This may include the specification of a configuration file, which is processed at the time it is encountered. o If a log file was specified, it is opened and made ready for processing. If no log file was given, or the filename '-' is specified on the command line, STDIN is used for input. o If an output directory was specified, the program does a 'chdir' to that directory in preparation for generating output. If no output directory was given, the current directory is used. o If a non-zero number of DNS Children processes were specified, they will be started, and the specified log file will be processed, either creating or updateing the specified DNS cache file. o If no hostname was given, the program attempts to get the hostname using a uname system call. If that fails, 'localhost' is used. o A history file is searched for. This file keeps previous month totals used on the main index.html page. The default file is named 'webalizer.hist', kept in the specified output directory, however may be changed using the "HistoryName" configuration file keyword. o If incremental processing was specified, a data file is searched for and loaded if found, containing the 'internal state' data of the program at the end of a previous run. The default file is named 'webalizer.current', kept in the specified output directory, however may be changed using the "IncrementalName" configuration file keyword. o Main processing begins on the log file. If the log spans multiple months, a separate HTML document is created for each month. o After main processing, the main 'index.html' page is created, which has totals by month and links to each months HTML document. o A new history file is saved to disk, which includes totals generated by The Webalizer during the current run. o If incremental processing was specified, a data file is written that contains the 'internal state' data at the end of this run. Incremental Processing ---------------------- Version 1.2x of The Webalizer adds incremental run capability. Simply put, this allows processing large log files by breaking them up into smaller pieces, and processing these pieces instead. What this means in real terms is that you can now rotate your log files as often as you want, and still be able to produce monthly usage statistics without the loss of any detail. This is accomplished by saving and restoring all relevant internal data to a disk file between runs. Doing so allows the program to 'start where it left off' so to speak, and allows the preservation of detail from one run to the next. Some special precautions need to be taken when using the incremental run capability of The Webalizer. Configuration options should not be changed between runs, as that could cause corruption of the internal stored data. For example, changing the MangleAgents level will cause different representations of user agents to be stored, producing invalid results in the user agents section of the report. If you need to change configuration options, do it at the end of the month after normal processing of the previous month and before processing the current month. You may also want to delete the 'webalizer.current' file as well (or whatever name was specified using the "IncrementalName" configuration option). The Webalizer also attempts to prevent data duplication by keeping track of the timestamp of the last record processed. This timestamp is then compared to current records being processed, and any records that were logged previous to that timestamp are ignored. This, in theory, should allow you to re-process logs that have already been processed, or process logs that contain a mix of processed/not yet processed records, and not produce duplication of statistics. The only time this may break is if you have duplicate timestamps in two separate log files... any records in the second log file that do have the same timestamp as the last record in the previous log file processed, will be discarded as if they had already been processed. There are lots of ways to prevent this however, for example, stopping the web server before rotating logs will prevent this situation. This setup also necessitates that you always process logs in chronological order, otherwise data loss will occur as a result of the timestamp compare. Output Produced --------------- The Webalizer produces several reports (html) and graphics for each month processed. In addition, a summary page is generated for the current and previous months (up to 12), a history file is created and if incremental mode is used, the current month's processed data. The exact location and names of these files can be changed using configuration files and command line options. The files produced, (default names) are: index.html - Main summary page (extension may be changed) usage.png - Yearly graph displayed on the main index page usage_YYYYMM.html - Monthly summary page (extension may be changed) usage_YYYYMM.png - Monthly usage graph for specified month/year daily_usage_YYYYMM.png - Daily usage graph for specified month/year hourly_usage_YYYYMM.png - Hourly usage graph for specified month/year site_YYYYMM.html - All sites listing (if enabled) url_YYYYMM.html - All urls listing (if enabled) ref_YYYYMM.html - All referrers listing (if enabled) agent_YYYYMM.html - All user agents listing (if enabled) search_YYYYMM.html - All search strings listing (if enabled) webalizer.hist - Previous month history (may be changed) webalizer.current - Incremental Data (may be changed) site_YYYYMM.tab - tab delimited sites file url_YYYYMM.tab - tab delimited urls file ref_YYYYMM.tab - tab delimited referrers file agent_YYYYMM.tab - tab delimited user agents file user_YYYYMM.tab - tab delimited usernames file search_YYYYMM.tab - tab delimited search string file The yearly (index) report shows statistics for a 12 month period, and links to each month. The monthly report has detailed statistics for that month with additional links to any URL's and referrers found. The various totals shown are explained below. Hits Any request made to the server which is logged, is considered a 'hit'. The requests can be for anything... html pages, graphic images, audio files, CGI scripts, etc... Each valid line in the server log is counted as a hit. This number represents the total number of requests that were made to the server during the specified report period. Files Some requests made to the server, require that the server then send something back to the requesting client, such as a html page or graphic image. When this happens, it is considered a 'file' and the files total is incremented. The relationship between 'hits' and 'files' can be thought of as 'incoming requests' and 'outgoing responses'. Pages Pages are, well, pages! Generally, any HTML document, or anything that generates an HTML document, would be considered a page. This does not include the other stuff that goes into a document, such as graphic images, audio clips, etc... This number represents the number of 'pages' requested only, and does not include the other 'stuff' that is in the page. What actually constitutes a 'page' can vary from server to server. The default action is to treat anything with the extension '.htm', '.html' or '.cgi' as a page. A lot of sites will probably define other extensions, such as '.phtml', '.php3' and '.pl' as pages as well. Some people consider this number as the number of 'pure' hits... I'm not sure if I totally agree with that viewpoint. Some other programs (and people :) refer to this as 'Pageviews'. Sites Each request made to the server comes from a unique 'site', which can be referenced by a name or ultimately, an IP address. The 'sites' number shows how many unique IP addresses made requests to the server during the reporting time period. This DOES NOT mean the number of unique individual users (real people) that visited, which is impossible to determine using just logs and the HTTP protocol (however, this number might be about as close as you will get). Visits Whenever a request is made to the server from a given IP address (site), the amount of time since a previous request by the address is calculated (if any). If the time difference is greater than a pre-configured 'visit timeout' value (or has never made a request before), it is considered a 'new visit', and this total is incremented (both for the site, and the IP address). The default timeout value is 30 minutes (can be changed), so if a user visits your site at 1:00 in the afternoon, and then returns at 3:00, two visits would be registered. Note: in the 'Top Sites' table, the visits total should be discounted on 'Grouped' records, and thought of as the "Minimum number of visits" that came from that grouping instead. Note: Visits only occur on PageType requests, that is, for any request whose URL is one of the 'page' types defined with the PageType option. Due to the limitation of the HTTP protocol, log rotations and other factors, this number should not be taken as absolutely accurate, rather, it should be considered a pretty close "guess". KBytes The KBytes (kilobytes) value shows the amount of data, in KB, that was sent out by the server during the specified reporting period. This value is generated directly from the log file, so it is up to the web server to produce accurate numbers in the logs (some web servers do stupid things when it comes to reporting the number of bytes). In general, this should be a fairly accurate representation of the amount of outgoing traffic the server had, regardless of the web servers reporting quirks. Note: A kilobyte is 1024 bytes, not 1000 :) Top Entry and Exit Pages The Top Entry and Exit tables give a rough estimate of what URL's are used to enter your site, and what the last pages viewed are. Because of limitations in the HTTP protocol, log rotations, etc... this number should be considered a good "rough guess" of the actual numbers, however will give a good indication of the overall trend in where users come into, and exit, your site. Command Line Options -------------------- The Webalizer supports many different configuration options that will alter the way the program behaves and generates output. Most of these can be specified on the command line, while some can only be specified in a configuration file. The command line options are listed below, with references to the corresponding configuration file keywords. -------------------------------------------------------------------------- General Options --------------- -h Display all available command line options and exit program. -v Display program version and exit program. -d Display additional 'debugging' information for errors and warnings produced during processing. This normally would not be used except to determine why you are getting all those errors and wanted to see the actual data. Normally The Webalizer will just tell you it found an error, not the actual data. This option will display the data as well. Config file keyword: Debug -F Specify that the log being used is a ftp log. Normally, the Webalizer expects to find a valid CLF or Combined format we server log file. This option allows you to process wu-ftpd xferlogs as well. Config file keyword: LogType -f Fold out of sequence log records back into analysis, by treating them as if they were the same date/time as the last good record. Normally, out of sequence log records are ignored. If you run apache, don't worry about this. Config file keyword: FoldSeqErr -i Ignore history file. USE WITH CAUTION. This causes The Webalizer to ignore any existing history file produced from previous runs and generate it's output from scratch. The effect will be as if The Webalizer is being run for the first time and any previous statistics will be lost (although the HTML documents, if any, will not be deleted) on the main index.html (yearly) web page. Config file keyword: IgnoreHist -p Preserve state (incremental processing). This allows the processing of partial logs in increments. At the end of the program, all relevant internal data is saved, so that it may be restored the next time the program is run. This allows sites that must rotate their logs more than once a month to still be able to use The Webalizer, and not worry about having to gather and feed an entire months logs to the program at the end of the month. See the section on "Incremental Processing" below for additional information. The default is to not perform incremental processing. Use this command line option to enable the feature. Config file keyword: Incremental -q Quiet mode. Normally, The Webalizer will produce various messages while it runs letting you know what it's doing. This option will suppress those messages. It should be noted that this WILL NOT suppress errors and warnings, which are output to STDERR. Config file keyword: Quiet -Q ReallyQuiet mode. This allows suppression of _all_ messages generated by The Webalizer, including warnings and errors. Useful when The Webalizer is run as a cron job. Config file keyword: ReallyQuiet -T Display timing information. The Webalizer keeps track of the time it begins and ends processing, and normally displays the total processing time at the end of each run. If quiet mode (-q or 'Quiet yes' in configuration file) is specified, this information is not displayed. This option forces the display of timing totals if quiet mode has been specified, otherwise it is redundant and will have no effect. Config file keyword: TimeMe -c file This option specifies a configuration file to use. Configuration files allow greater control over how The Webalizer behaves, and there are several ways to use them. As of version 0.***, The Webalizer searches for a default configuration file in the current directory named "webalizer.conf", and if not found, will search in the /etc/ directory for a file of the same name. In addition, you may specify a configuration file to use with this command line option. -n name This option specifies the hostname for the reports generated. The hostname is used in the title of all reports, and is also prepended to URL's in the reports. This allows The Webalizer to be run on log files for 'virtual' web servers or web servers that are different than the machine the reports are located on, and still allows clicking on the URL's to go to the proper location. If a hostname is not specified, either on the command line or in a configuration file, The Webalizer attempts to determine the hostname using a 'uname' system call. If this fails, "localhost" will be used as the hostname. Config file keyword: HostName -o dir This options specifies the output directory for the reports. If not specified here or in a configuration file, the current default directory will be used for output. Config file keyword: OutputDir -x name This option allows the generated pages to have an extension other than '.html', which is the default. Do not include the leading period ('.') when you specify the extension. Config file keyword: HTMLExtension -P name Specify the file extensions for 'pages'. Pages (sometimes called 'PageViews') are normally html documents and CGI scripts that display the whole page, not just parts of it. Some system will need to define a few more, such as 'phtml', 'php3' or 'pl' in order to have them counted as well. The default is 'htm*' and 'cgi' for web logs and 'txt' for ftp. Config file keyword: PageType -t name This option specifies the title string for all reports. This string is used, in conjunction with the hostname (if not blank) to produce the actual title. If not specified, the default of "Usage Statistics for" will be used. Config file keyword: ReportTitle -Y Supress Country graph. Normally, The Webalizer produces countr ... ...

近期下载者

相关文件


收藏者