htmlcxx-0.82
所属分类:WEB开发
开发工具:Visual C++
文件大小:663KB
下载次数:198
上传日期:2007-09-12 15:07:15
上 传 者:
zshshy
说明: 著名的标准C++的html解析器,能够方便地从html文件中获取数据,同时可以验证html地W3C。
(Well-known standard C the html parser, the ability to easily access files from the html data that can validate html to W3C.)
文件列表:
htmlcxx-0.82\aclocal.m4 (265114, 2007-08-12)
htmlcxx-0.82\ASF-2.0 (11358, 2006-06-16)
htmlcxx-0.82\AUTHORS (101, 2003-12-14)
htmlcxx-0.82\ChangeLog (4238, 2004-06-18)
htmlcxx-0.82\config.guess (45126, 2007-08-12)
htmlcxx-0.82\config.h.in (2224, 2007-08-12)
htmlcxx-0.82\config.sub (32931, 2007-08-12)
htmlcxx-0.82\configure (705266, 2007-08-12)
htmlcxx-0.82\configure.ac (828, 2007-08-12)
htmlcxx-0.82\COPYING (17992, 2006-06-16)
htmlcxx-0.82\css\css_lex.c (73040, 2006-06-16)
htmlcxx-0.82\css\css_lex.h (119, 2004-03-27)
htmlcxx-0.82\css\css_lex.l (3859, 2005-02-22)
htmlcxx-0.82\css\css_syntax.c (53645, 2006-06-16)
htmlcxx-0.82\css\css_syntax.h (1065, 2006-06-16)
htmlcxx-0.82\css\css_syntax.y (12889, 2005-02-22)
htmlcxx-0.82\css\default.css (4846, 2005-02-22)
htmlcxx-0.82\css\Makefile.am (499, 2005-02-22)
htmlcxx-0.82\css\Makefile.in (19824, 2007-08-12)
htmlcxx-0.82\css\parser.c (694, 2004-03-27)
htmlcxx-0.82\css\parser.h (803, 2004-03-27)
htmlcxx-0.82\css\parser_pp.cc (6972, 2005-02-22)
htmlcxx-0.82\css\parser_pp.h (2190, 2005-02-22)
htmlcxx-0.82\depcomp (17574, 2007-08-12)
htmlcxx-0.82\html\CharsetConverter.cc (1326, 2004-06-17)
htmlcxx-0.82\html\CharsetConverter.h (550, 2005-02-22)
htmlcxx-0.82\html\ci_string.h (1171, 2005-02-22)
htmlcxx-0.82\html\debug.h (945, 2005-02-22)
htmlcxx-0.82\html\Extensions.cc (721, 2004-06-17)
htmlcxx-0.82\html\Extensions.h (396, 2004-06-17)
htmlcxx-0.82\html\gen_tld.pl (873, 2006-06-16)
htmlcxx-0.82\html\Makefile.am (851, 2005-03-24)
htmlcxx-0.82\html\Makefile.in (27665, 2007-08-12)
htmlcxx-0.82\html\Node.cc (2434, 2005-02-22)
htmlcxx-0.82\html\Node.h (2064, 2005-02-22)
htmlcxx-0.82\html\ParserDom.cc (3551, 2005-02-22)
htmlcxx-0.82\html\ParserDom.h (767, 2005-02-22)
htmlcxx-0.82\html\ParserSax.cc (189, 2005-02-22)
htmlcxx-0.82\html\ParserSax.h (1433, 2005-03-24)
htmlcxx-0.82\html\ParserSax.tcc (8353, 2007-08-12)
... ...
htmlcxx - html and css APIs for C++
---------------------------------------------
Description
===========
htmlcxx is a simple non-validating css1 and html parser for C++.
Although there are several other html parsers available, htmlcxx has some
characteristics that make it unique:
- STL like navigation of DOM tree, using excelent's tree.hh library from
Kasper Peeters
- It is possible to reproduce exactly, character by character, the
original document from the parse tree
- Bundled css parser
- Optional parsing of attributes
- C++ code that looks like C++ (not so true anymore)
- Offsets of tags/elements in the original document are stored in the
nodes of the DOM tree
The parsing politics of htmlcxx were created trying to mimic mozilla
firefox (http://www.mozilla.org) behavior. So you should expect parse
trees similar to those create by firefox. However, differently from firefox,
htmlcxx does not insert non-existent stuff in your html. Therefore, serializing
the DOM tree gives exactly the same bytes contained in the original HTML
document.
News for version 0.7.3
======================
Added utility code to escape/decode urls as defined by RFC 2396.
Added new SAX interface. The API was slightly broken to support the new
SAX interface :-(.
Added Visual Studio 2003 projects for the WIN32 port.
Examples
========
Using htmlcxx is quite simple. Take a look
at this example.
-----------------------------------------------------------------------
#include
...
//Parse some html code
string html = "hey";
HTML::ParserDom parser;
tree dom = parser.parseTree(html);
//Print whole DOM tree
cout << dom << endl;
//Dump all links in the tree
tree::iterator it = dom.begin();
tree::iterator end = dom.end();
for (; it != end; ++it)
{
if (it->tagName() == "A")
{
it->parseAttributes();
cout << it->attributes("href");
}
}
//Dump all text of the document
it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
if ((!it->isTag()) && (!it->isComment()))
{
cout << it->text();
}
}
-------------------------------------------------
The htmlcxx application
=======================
htmlcxx is the name of both the library and the utility
application that comes with this package. Although the
htmlcxx (the application) is mostly useless for programming, you can use it
to easily see how htmlcxx (the library) would parse your html code.
Just install and try htmlcxx -h.
Downloads
=========
Use the project page at sourceforge: http://sf.net/projects/htmlcxx
License Stuff
=============
Code is now under the LGPL. This was our initial intention, and is
now possible thanks to the author of tree.hh, who allowed us to use it
under LGPL only for HTML::Node template instances. Check
http://www.fsf.org or the COPYING file in the distribution for details
about the LGPL license. The uri parsing code is a derivative work of
Apache web server uri parsing routines. Check
www.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the
distribution for details.
----------------------------------------
Enjoy!
Davi de Castro Reis -
Robson Braga Araújo -
Last Updated: Thu Mar 24 00:56:05 2005
近期下载者:
相关文件:
收藏者: