@hackage fast-tagsoup1.0.14

Fast parsing and extracting information from (possibly malformed) HTML/XML documents

Categories
- XML
License
BSD-3-Clause
Maintainer
Vladimir Shabanov <vshabanoff@gmail.com>
Links
Versions
- 1.0.14 Tue, 4 Jul 2017
- 1.0.13 Thu, 30 Mar 2017
- 1.0.12 Fri, 6 May 2016
- 1.0.11 Thu, 5 May 2016
- 1.0.10 Wed, 4 May 2016
- 1.0.9 Sat, 30 Apr 2016

Installation
In your cabal file:
Dependencies (6)
- base >=4 && <5
- bytestring
- containers
- tagsoup >=0.13.10
- text
- text-icu
Dependents (3)
@hackage/acme-everything, @hackage/rospkg, @hackage/hsforce

Fast TagSoup parser. Speeds of 20-200MB/sec were observed.

Works only with strict bytestrings.

This library is intended to be used in conjunction with the original tagsoup package:

import Text.HTML.TagSoup hiding (parseTags, renderTags)
import Text.HTML.TagSoup.Fast

Besides speed fast-tagsoup correctly handles HTML <script> and <style> tags, converts tags to lower case and can decode non UTF-8 XML for you.

This parser is used in production in BazQux Reader feeds and comments crawler.