webalizer-2.01-10-src
所属分类:Web服务器
开发工具:Unix_Linux
文件大小:418KB
下载次数:14
上传日期:2009-05-06 23:06:04
上 传 者:
ruijing
说明: 统计日志的工具,webalizer分析,以学习到很多编程的方法,看好的源代码,对于提高自己的编程水平,比自己写源代码的帮助更大。
(Statistics log tools, webalizer analysis to learn a lot of programming methods, the source code good for raising the level of their programming, than to write the source code to help more.)
文件列表:
webalizer-2.01-10-src\webalizer-2.01-10\aclocal.m4 (546, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\CHANGES (18939, 2002-04-17)
webalizer-2.01-10-src\webalizer-2.01-10\configure (68730, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\configure.in (6688, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\COPYING (17990, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\Copyright (1323, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\country-codes.txt (4111, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\dns_resolv.c (25025, 2002-04-17)
webalizer-2.01-10-src\webalizer-2.01-10\dns_resolv.h (1295, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\graphs.c (25669, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\graphs.h (444, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\hashtab.c (29558, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\hashtab.h (4715, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\INSTALL (9926, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\install-sh (5585, 2000-09-29)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.catalan (34177, 2000-10-19)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.chinese (32321, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.croatian (32616, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.czech (33290, 2002-04-17)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.danish (32888, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.dutch (35411, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.english (32670, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.estonian (32574, 2001-07-05)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.finnish (33273, 2001-02-10)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.french (34883, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.galician (33252, 2001-06-15)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.german (35538, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.greek (32700, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.hungarian (33648, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.icelandic (33050, 2000-12-17)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.indonesian (35043, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.italian (34694, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.japanese (32321, 2001-10-23)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.korean (32076, 2000-10-20)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.latvian (33024, 2000-10-19)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.malay (34109, 2000-10-31)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.norwegian (33808, 2000-11-19)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.polish (33972, 2000-10-06)
webalizer-2.01-10-src\webalizer-2.01-10\lang\webalizer_lang.portuguese (33554, 2002-04-17)
... ...
The Webalizer - A web server log file analysis tool
Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net)
Distributed under the GNU GPL. See the files "COPYING" and
"Copyright" supplied with the distribution for additional info.
What is The Webalizer?
----------------------
The Webalizer is a web server log file analysis program which produces
usage statistics in HTML format for viewing with a browser. The results
are presented in both columnar and graphical format, which facilitates
interpretation. Yearly, monthly, daily and hourly usage statistics are
presented, along with the ability to display usage by site, URL, referrer,
user agent (browser), search string, entry/exit page, username and country
(some information is only available if supported and present in the log
files being processed). Processed data may also be exported into most
database and spreadsheet programs that support tab delimited data formats.
The Webalizer supports CLF (common log format) log files, as well as
Combined log formats as defined by NCSA and others, and variations
of these which it attempts to handle intelligently. In addition, wu-ftpd
xferlog formatted logs and squid proxy logs are supported.
Gzip compressed logs may now be used as input directly. Any log filename
that ends with a '.gz' extension will be assumed to be in gzip format and
uncompressed on the fly as it is being read. In addition, the Webalizer
also supports DNS lookup capabilities if enabled at compile time. See
the file DNS.README for additional information.
This documentation applies to The Webalizer Version 2.01
Running the Webalizer
---------------------
The Webalizer was designed to be run from a Unix command line prompt or
as a cron job. There are several command line options which will modify
the results it produces, and configuration files can be used as well.
The format of the command line is:
webalizer [options ...] [log-file]
Where 'options' can be one or more of the supported command line
switches described below. 'log-file' is the name of the log file
to process (see below for more detailed information). If a dash
("-") is specified for the log-file name, STDIN will be used.
Once executed, the general flow of the program follows:
o A default configuration file is scanned for. A file named
'webalizer.conf' is searched for in the current directory, and if
found, it's configuration data is parsed. If the file is not
present in the current directory, the file '/etc/webalizer.conf'
is searched for and, if found, is used instead.
o Any command line arguments given to the program are parsed. This
may include the specification of a configuration file, which is
processed at the time it is encountered.
o If a log file was specified, it is opened and made ready for
processing. If no log file was given, or the filename '-' is
specified on the command line, STDIN is used for input.
o If an output directory was specified, the program does a 'chdir' to
that directory in preparation for generating output. If no output
directory was given, the current directory is used.
o If a non-zero number of DNS Children processes were specified, they
will be started, and the specified log file will be processed,
either creating or updateing the specified DNS cache file.
o If no hostname was given, the program attempts to get the hostname
using a uname system call. If that fails, 'localhost' is used.
o A history file is searched for. This file keeps previous month
totals used on the main index.html page. The default file is
named 'webalizer.hist', kept in the specified output directory,
however may be changed using the "HistoryName" configuration file
keyword.
o If incremental processing was specified, a data file is searched for
and loaded if found, containing the 'internal state' data of the
program at the end of a previous run. The default file is named
'webalizer.current', kept in the specified output directory, however
may be changed using the "IncrementalName" configuration file keyword.
o Main processing begins on the log file. If the log spans multiple
months, a separate HTML document is created for each month.
o After main processing, the main 'index.html' page is created, which
has totals by month and links to each months HTML document.
o A new history file is saved to disk, which includes totals generated
by The Webalizer during the current run.
o If incremental processing was specified, a data file is written that
contains the 'internal state' data at the end of this run.
Incremental Processing
----------------------
Version 1.2x of The Webalizer adds incremental run capability. Simply
put, this allows processing large log files by breaking them up into
smaller pieces, and processing these pieces instead. What this means
in real terms is that you can now rotate your log files as often as you
want, and still be able to produce monthly usage statistics without the
loss of any detail. This is accomplished by saving and restoring all
relevant internal data to a disk file between runs. Doing so allows the
program to 'start where it left off' so to speak, and allows the
preservation of detail from one run to the next.
Some special precautions need to be taken when using the incremental
run capability of The Webalizer. Configuration options should not be
changed between runs, as that could cause corruption of the internal
stored data. For example, changing the MangleAgents level will cause
different representations of user agents to be stored, producing invalid
results in the user agents section of the report. If you need to change
configuration options, do it at the end of the month after normal
processing of the previous month and before processing the current month.
You may also want to delete the 'webalizer.current' file as well (or
whatever name was specified using the "IncrementalName" configuration
option).
The Webalizer also attempts to prevent data duplication by keeping
track of the timestamp of the last record processed. This timestamp
is then compared to current records being processed, and any records
that were logged previous to that timestamp are ignored. This, in
theory, should allow you to re-process logs that have already been
processed, or process logs that contain a mix of processed/not yet
processed records, and not produce duplication of statistics. The
only time this may break is if you have duplicate timestamps in two
separate log files... any records in the second log file that do have
the same timestamp as the last record in the previous log file processed,
will be discarded as if they had already been processed. There are
lots of ways to prevent this however, for example, stopping the web
server before rotating logs will prevent this situation. This setup
also necessitates that you always process logs in chronological order,
otherwise data loss will occur as a result of the timestamp compare.
Output Produced
---------------
The Webalizer produces several reports (html) and graphics for each
month processed. In addition, a summary page is generated for the
current and previous months (up to 12), a history file is created
and if incremental mode is used, the current month's processed data.
The exact location and names of these files can be changed using
configuration files and command line options. The files produced,
(default names) are:
index.html - Main summary page (extension may be changed)
usage.png - Yearly graph displayed on the main index page
usage_YYYYMM.html - Monthly summary page (extension may be changed)
usage_YYYYMM.png - Monthly usage graph for specified month/year
daily_usage_YYYYMM.png - Daily usage graph for specified month/year
hourly_usage_YYYYMM.png - Hourly usage graph for specified month/year
site_YYYYMM.html - All sites listing (if enabled)
url_YYYYMM.html - All urls listing (if enabled)
ref_YYYYMM.html - All referrers listing (if enabled)
agent_YYYYMM.html - All user agents listing (if enabled)
search_YYYYMM.html - All search strings listing (if enabled)
webalizer.hist - Previous month history (may be changed)
webalizer.current - Incremental Data (may be changed)
site_YYYYMM.tab - tab delimited sites file
url_YYYYMM.tab - tab delimited urls file
ref_YYYYMM.tab - tab delimited referrers file
agent_YYYYMM.tab - tab delimited user agents file
user_YYYYMM.tab - tab delimited usernames file
search_YYYYMM.tab - tab delimited search string file
The yearly (index) report shows statistics for a 12 month period, and
links to each month. The monthly report has detailed statistics for
that month with additional links to any URL's and referrers found.
The various totals shown are explained below.
Hits
Any request made to the server which is logged, is considered a 'hit'.
The requests can be for anything... html pages, graphic images, audio
files, CGI scripts, etc... Each valid line in the server log is
counted as a hit. This number represents the total number of requests
that were made to the server during the specified report period.
Files
Some requests made to the server, require that the server then send
something back to the requesting client, such as a html page or graphic
image. When this happens, it is considered a 'file' and the files
total is incremented. The relationship between 'hits' and 'files' can
be thought of as 'incoming requests' and 'outgoing responses'.
Pages
Pages are, well, pages! Generally, any HTML document, or anything
that generates an HTML document, would be considered a page. This
does not include the other stuff that goes into a document, such as
graphic images, audio clips, etc... This number represents the number
of 'pages' requested only, and does not include the other 'stuff' that
is in the page. What actually constitutes a 'page' can vary from
server to server. The default action is to treat anything with the
extension '.htm', '.html' or '.cgi' as a page. A lot of sites will
probably define other extensions, such as '.phtml', '.php3' and '.pl'
as pages as well. Some people consider this number as the number of
'pure' hits... I'm not sure if I totally agree with that viewpoint.
Some other programs (and people :) refer to this as 'Pageviews'.
Sites
Each request made to the server comes from a unique 'site', which can
be referenced by a name or ultimately, an IP address. The 'sites'
number shows how many unique IP addresses made requests to the server
during the reporting time period. This DOES NOT mean the number of
unique individual users (real people) that visited, which is impossible
to determine using just logs and the HTTP protocol (however, this
number might be about as close as you will get).
Visits
Whenever a request is made to the server from a given IP address
(site), the amount of time since a previous request by the address
is calculated (if any). If the time difference is greater than a
pre-configured 'visit timeout' value (or has never made a request before),
it is considered a 'new visit', and this total is incremented (both
for the site, and the IP address). The default timeout value is 30
minutes (can be changed), so if a user visits your site at 1:00 in
the afternoon, and then returns at 3:00, two visits would be registered.
Note: in the 'Top Sites' table, the visits total should be discounted
on 'Grouped' records, and thought of as the "Minimum number of visits"
that came from that grouping instead. Note: Visits only occur on
PageType requests, that is, for any request whose URL is one of the
'page' types defined with the PageType option. Due to the limitation
of the HTTP protocol, log rotations and other factors, this number
should not be taken as absolutely accurate, rather, it should be
considered a pretty close "guess".
KBytes
The KBytes (kilobytes) value shows the amount of data, in KB, that
was sent out by the server during the specified reporting period. This
value is generated directly from the log file, so it is up to the
web server to produce accurate numbers in the logs (some web servers
do stupid things when it comes to reporting the number of bytes). In
general, this should be a fairly accurate representation of the amount
of outgoing traffic the server had, regardless of the web servers
reporting quirks.
Note: A kilobyte is 1024 bytes, not 1000 :)
Top Entry and Exit Pages
The Top Entry and Exit tables give a rough estimate of what URL's
are used to enter your site, and what the last pages viewed are.
Because of limitations in the HTTP protocol, log rotations, etc...
this number should be considered a good "rough guess" of the actual
numbers, however will give a good indication of the overall trend in
where users come into, and exit, your site.
Command Line Options
--------------------
The Webalizer supports many different configuration options that will
alter the way the program behaves and generates output. Most of these
can be specified on the command line, while some can only be specified
in a configuration file. The command line options are listed below,
with references to the corresponding configuration file keywords.
--------------------------------------------------------------------------
General Options
---------------
-h Display all available command line options and exit program.
-v Display program version and exit program.
-d Display additional 'debugging' information for errors and
warnings produced during processing. This normally would
not be used except to determine why you are getting all those
errors and wanted to see the actual data. Normally The
Webalizer will just tell you it found an error, not the
actual data. This option will display the data as well.
Config file keyword: Debug
-F Specify that the log being used is a ftp log. Normally, the
Webalizer expects to find a valid CLF or Combined format
we server log file. This option allows you to process wu-ftpd
xferlogs as well.
Config file keyword: LogType
-f Fold out of sequence log records back into analysis, by
treating them as if they were the same date/time as the
last good record. Normally, out of sequence log records
are ignored. If you run apache, don't worry about this.
Config file keyword: FoldSeqErr
-i Ignore history file. USE WITH CAUTION. This causes The
Webalizer to ignore any existing history file produced from
previous runs and generate it's output from scratch. The
effect will be as if The Webalizer is being run for the
first time and any previous statistics will be lost (although
the HTML documents, if any, will not be deleted) on the main
index.html (yearly) web page.
Config file keyword: IgnoreHist
-p Preserve state (incremental processing). This allows the
processing of partial logs in increments. At the end of
the program, all relevant internal data is saved, so that
it may be restored the next time the program is run. This
allows sites that must rotate their logs more than once a
month to still be able to use The Webalizer, and not worry
about having to gather and feed an entire months logs to
the program at the end of the month. See the section on
"Incremental Processing" below for additional information.
The default is to not perform incremental processing. Use
this command line option to enable the feature.
Config file keyword: Incremental
-q Quiet mode. Normally, The Webalizer will produce various
messages while it runs letting you know what it's doing.
This option will suppress those messages. It should be
noted that this WILL NOT suppress errors and warnings, which
are output to STDERR.
Config file keyword: Quiet
-Q ReallyQuiet mode. This allows suppression of _all_ messages
generated by The Webalizer, including warnings and errors.
Useful when The Webalizer is run as a cron job.
Config file keyword: ReallyQuiet
-T Display timing information. The Webalizer keeps track of the
time it begins and ends processing, and normally displays the
total processing time at the end of each run. If quiet mode
(-q or 'Quiet yes' in configuration file) is specified, this
information is not displayed. This option forces the display
of timing totals if quiet mode has been specified, otherwise
it is redundant and will have no effect.
Config file keyword: TimeMe
-c file This option specifies a configuration file to use. Configuration
files allow greater control over how The Webalizer behaves, and
there are several ways to use them. As of version 0.***, The
Webalizer searches for a default configuration file in the
current directory named "webalizer.conf", and if not found,
will search in the /etc/ directory for a file of the same name.
In addition, you may specify a configuration file to use with
this command line option.
-n name This option specifies the hostname for the reports generated.
The hostname is used in the title of all reports, and is also
prepended to URL's in the reports. This allows The Webalizer
to be run on log files for 'virtual' web servers or web servers
that are different than the machine the reports are located on,
and still allows clicking on the URL's to go to the proper
location. If a hostname is not specified, either on the
command line or in a configuration file, The Webalizer attempts
to determine the hostname using a 'uname' system call. If this
fails, "localhost" will be used as the hostname.
Config file keyword: HostName
-o dir This options specifies the output directory for the reports.
If not specified here or in a configuration file, the current
default directory will be used for output.
Config file keyword: OutputDir
-x name This option allows the generated pages to have an extension
other than '.html', which is the default. Do not include the
leading period ('.') when you specify the extension.
Config file keyword: HTMLExtension
-P name Specify the file extensions for 'pages'. Pages (sometimes
called 'PageViews') are normally html documents and CGI
scripts that display the whole page, not just parts of it.
Some system will need to define a few more, such as 'phtml',
'php3' or 'pl' in order to have them counted as well. The
default is 'htm*' and 'cgi' for web logs and 'txt' for ftp.
Config file keyword: PageType
-t name This option specifies the title string for all reports. This
string is used, in conjunction with the hostname (if not blank)
to produce the actual title. If not specified, the default of
"Usage Statistics for" will be used.
Config file keyword: ReportTitle
-Y Supress Country graph. Normally, The Webalizer produces
countr ... ...
近期下载者:
相关文件:
收藏者: