high5

所属分类:特征抽取
开发工具:JavaScript
文件大小:10KB
下载次数:0
上传日期:2019-06-23 20:30:39
上 传 者sh-1993
说明:  html5标记器
(html 5 tokenizer)

文件列表:
index.js (47989, 2019-06-24)
package.json (1071, 2019-06-24)
test (0, 2019-06-24)
test\debugging_tokenizer.js (747, 2019-06-24)
test\html5lib-tests (0, 2019-06-24)
test\html5lib-tokenizer.js (2993, 2019-06-24)

#high5 (eventually) spec-compliant html5 parser ###Goals My previous HTML parser, [`htmlparser2`](https://github.com/fb55/htmlparser2), reached a point where a clean cut was needed. _high5_ is this cut, even though it's based on _htmlparser2_ and will try to be backwards compatible (I even tried to preserve the git history, so all previous committers are still credited). Some of the things that will be supported: - [x] `doctype`s were treated as processing instructions & not parsed at all. - [x] Several token types that were previously handled as processing instruction tokens are handled as (bogus) comments in the HTML5 spec. - [ ] The `xmlMode` option will still be available & conditionally switch these features on. - [ ] Add a _document mode_. (`htmlparser2` is always in _fragment mode_, meaning that eg. the empty document (`""`) will result in an empty DOM.) - [ ] Implicit opening & closing tags. (`htmlparser2` only checks the top element of the stack for the latter.) - [ ] Foster parenting (eg. `foo...` should be handled as `foo
...`). - [ ] \(Potentially) handle character encodings (?). ###State - Spec-compliant\* tokenizer - Rudimentary tag-handling (still a long way to go, only marginally better than htmlparser2). \* The tokenizer takes several shortcuts, which greatly increase the speed of a JavaScript implementation, but disobay the spec implementation-wise. The output should be spec-compliant, though.
近期下载者

相关文件


收藏者