@hackage readability0.0.1.0

Extracts text of main article from HTML document

readability

Give readability an HTML document and it will detect and extract text of the article while removing everything unnecessary like menus, advertisements or sidebars. It is more or less reimplementation of python-readability.

The package contains both a library and simple executable.

Example of using readability executable

Having an article that looks like following image:

Original HTML

we can extract text by calling:

$> readability https://mises.org/wire/why-central-banks-are-threat-our-savings

and we get the following HTML:

Extracted text

If we are interested in plain text, we can further use pandoc:

$> readability https://mises.org/wire/why-central-banks-are-threat-our-savings | pandoc -f html -t plain

The US personal savings rate jumped to 33 percent in April from 12.7
percent in March and 8 percent in April last year. An increase in
savings is regarded by popular economics as less expenditure on
consumption. Since consumption expenditure is considered as the main
driving force of the economy, obviously a rebound in savings, which
implies less consumption, cannot be good for economic activity, so it is
held. Saving and wealth—what is the relation?

To maintain their life and well-being, individuals require access to
consumer goods. An increase in various consumer goods permits an
increase in individuals’ living standards. What allows an increase in
the production of consumer goods is the maintenance and the enhancement
of the infrastructure of an economy. With better infrastructure, a
greater quantity and better quality of consumer goods could be generated
and more real wealth can be produced.

The enhancement and the maintenance of the infrastructure becomes
possible because of the availability of final consumer goods that
sustain the various individuals who are busy expanding and maintaining
the infrastructure. It is the producers of final consumer goods who pay
the various individuals engaged in maintenaning and enhancing the
infrastructure. The producers of final consumer goods pay these
individuals (i.e., the intermediary producers) out of the saved or
unconsumed production of final consumer goods.

Note that when a producer of final consumer goods decides to save more,
i.e., to consume less, the fall in his consumption is offset by the
increase in the consumption of individuals who are engaged in the
intermediary stages of production. This means that overall consumption
is not declining because of an increase in saving—as popular thinking
has it.

Had we not processed the article through readability, we would have gotten:

$> curl https://mises.org/wire/why-central-banks-are-threat-our-savings | pandoc -f html -t plain

Skip to main content

[Home]

Toggle navigation

-   Blog
-   Mises Wire
-   Books
-   Podcast
-   Video
-   Events
-   Store
-   Graduate Program

-   Ver en Español

Stay Connected

GO

SUPPORT MISES

JOIN OR RENEW TODAY

SUPPORT MISES

JOIN OR RENEW TODAY

Mises Wire

GET NEWS AND ARTICLES IN YOUR INBOXPrint

A

A

Home | Wire | Why Central Banks Are a Threat to Our Savings

Why Central Banks Are a Threat to Our Savings

-   [dollars]

0 Views

Tags

Money and Banking

06/25/2020Frank Shostak

The US personal savings rate jumped to 33 percent in April from 12.7
percent in March and 8 percent in April last year. An increase in
savings is regarded by popular economics as less expenditure on
consumption. Since consumption expenditure is considered as the main
driving force of the economy, obviously a rebound in savings, which
implies less consumption, cannot be good for economic activity, so it is
held. Saving and wealth—what is the relation?

Contribute

Project is hosted at https://sr.ht/~geyaeb/haskell-readability/ . The homepage provides links to Mercurial repository, mailing list and ticket tracker.

Patches, suggestions, questions and general discussions can be send to the mailing list. Detailed information about sending patches by email can be found at [https://man.sr.ht/hg.sr.ht/email.md](https://man.sr.ht/hg.sr.ht/email.md).