sitecheck:Web开发人员的模块化网站蜘蛛。-开源

  • G3_845459
    了解作者
  • 43KB
    文件大小
  • gz
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-06-09 03:50
    上传日期
sitecheck不仅仅是一个链接检查器,它还是一个网站蜘蛛(也称为搜寻器),它可以通过测试整个网站以及来自搜索引擎的入站链接和到其他站点的出站链接来解决以下问题,从而帮助实现SEO:循环重定向( HTTP 301/302),断开的链接(HTTP 404),服务器错误(HTTP 500),拼写错误,可读性得分较低(使用Flesch Reading Ease测试),缺少/清空/重复的元标记,重复的内容,缓慢的页面速度, W3C验证错误和可访问性错误。 Sitecheck还可以找出导致PCI合规性失败的一些常见原因,例如安全页面上的内容不安全,SQL注入/跨站点脚本(XSS)漏洞,不安全的加密密码和开放邮件中继。 信息泄漏的源,例如标题或页面中的电子邮件地址和IP地址,将被记录。 包括一个称为domaincheck的单独模块,该模块检查域到期日期,SSL证书到期日期和SPF记录。
sitecheck-1.7.tar.gz
  • sitecheck-1.7
  • PKG-INFO
    1.7KB
  • LICENSE.txt
    35KB
  • CHANGELOG.txt
    4.6KB
  • sitecheck
  • logging.py
    1.1KB
  • config.py
    3.7KB
  • dict.txt
    26B
  • __init__.py
    794B
  • tidylib
  • __init__.py
    7.8KB
  • sink.py
    3.6KB
  • modules.py
    28.3KB
  • reporting.py
    8.7KB
  • core.py
    27KB
  • README.txt
    5.6KB
  • setup.py
    2.6KB
  • runsitecheck.py
    6.6KB
  • domaincheck.py
    9.7KB
内容介绍
Copyright 2009-2013 Andrew Kershaw Licensed under the GNU Affero General Public License v3 (see "LICENSE.txt" file). WARNING This program can generate a large number of requests. Only run sitecheck against sites you have permission to scan. Running it against production sites is done at your own risk and not recommended without a good understanding of the configuration options. Do not give the authenticate module access to a CMS or site administration area. Doing so will result in unpredictable and probably catastrophic results. The security module only tries to generate errors using simple attacks and does not attempt any exploits. It will significantly increase the number of requests however, and will also submit any forms it finds multiple times. Dependencies: Python 3 HTML Tidy (validation, accessibility) Enchant, pyenchant (spelling) OpenSSL, pyOpenSSL, dnspython3 (domain check) WARNING: *pyenchant filters do not currently work with Python 3.3 (Python 3.2 seems fine). A patched version of pyenchant is available at: https://github.com/arkershaw/pyenchant/ *dnspython3 on Windows (version 1.10.0) is currently affected by this bug, which prevents some functionality of domain check from working: https://github.com/rthalley/dnspython/pull/20 Using VirtualEnv is recommended due to the development status of some dependencies. Installation: Windows: Download and install the following: Python 3: http://www.python.org/download/ pyenchant (if spellcheck is required): http://www.rfk.id.au/software/pyenchant/download.html (the Windows installer includes the Enchant library) pyopenssl (if domain check is required): http://pypi.python.org/pypi/pyOpenSSL/ libtidy.dll (if validation or accessibility are required): http://tidy.sourceforge.net/#binaries (place libtidy.dll on your system path, also see: http://countergram.com/open-source/pytidylib/docs/index.html) dnspython3 (if domain check is required): http://www.dnspython.org/ To install dnspython3 and sitecheck, extract each archive then open a command window in the same directory as the extracted files and type: python setup.py install Linux: Packages for dependencies should be available from your distribution's package manager or installable via pip or the links above. Install all dependencies and then extract the archive and run: python setup.py install Usage: Windows: C:\Python32\Scripts\runsitecheck.py -d http://www.domain-goes-here C:\path\to\output Linux: runsitecheck.py -d http://www.domain-goes-here /path/to/output To specify the default page, use the -p switch: runsitecheck.py -d http://www.domain-goes-here -p home.html /path/to/output See "Configuration" below for running repeated tests against the same domain. While running: Ctrl+c will prompt for abort or suspend. To resume a suspended job or use an existing configuration file, run the script with the path to an existing output directory: runsitecheck.py /path/to/output Modules: Persister -> Downloads site files to disk for further analysis. Disabled by default. InboundLinks -> Checks URLs in the search result listings from the Google and Bing search engines. Disabled by default. RegexMatch -> Checks for regular expression match in headers and content. To search for headers which don't match a regular expression, prefix the name with ^ and to search for content which doesn't match, prefix with _ Validator -> Lists validation errors. Disabled by default. Accessibility -> Outputs selected accessibility warnings (those that can be automatically tested). Disabled by default. MetaData -> Checks for missing/empty/duplicate meta title, description and keywords. StatusLog -> Logs any 4xx and 5xx responses. Also checks outbound links. Security -> Attempts basic SQL injection and XSS attacks on get and post parameters. Disabled by default. Comments -> Logs the content of any HTML comments found. Spelling -> Spellcheck using Enchant. Custom dictionary words are in the file "dict.txt". Disabled by default. Spider -> If this module is disabled, only a single page will be analysed. Scans all files under the domain/path as well as testing targets of external links. Readability -> Calculates the Flesch Reading Ease score and logs it if it is below the specified threshold. DuplicateContent -> Checks for the same response with different URLs. DomainCheck -> Gets important domain information including expiry date, SSL certificate expiry date, reverse DNS etc. Uses a whois proxy with a limit of 50 hits per day. Disabled by default. Authenticate -> Issues requests to authenticate with the target site before spidering beings. If specified, logout requests will be executed after spidering ends. Disabled by default. RequestList -> Define a list of requests manually which are executed in sequence. Disabled by default. RequiredPages -> Creates a list of required URLs which are logged if they are not found on the site. Disabled by default. InsecureContent -> Logs insecure content referenced from secure pages. Configuration: Configuration for the spider and individual modules can be found in "config.py". For site-specific configuration, copy config.py to the output directory specified on the command line. The domain and path properties can be specified in the config file and subsequently omitted from the command line (as with resuming a suspended job above). This config file will be used instead of the default. The custom dictionary file for the spelling module (dict.txt) can also be overridden by copying to the same location.
评论
    相关推荐