@hackage follow0.1.0.0

Haskell library to follow content published on any subject.

Follow

Follow is a Haskell library to build recipes which allow you to follow the content published about any subject you are interested.

Here bellow you have a quick tutorial you can follow. Just run the snippets of code in the repl.

:set -XOverloadedStrings
import Follow
import Data.Time (LocalTime)
import Control.Monad (join)
import qualified Data.Text.IO as T (writeFile)
import Data.Yaml (decodeFileThrough)

Subject

A subject is just a bunch of information about what is being followed. It consists of a title, a description and a list of tags,

haskell =
  Subject
    { sTitle = "Haskell"
    , sDescription = "Some resources about Haskell"
    , sTags = ["haskell", "programming"]
    }

Directory

A directory is just a subject and a list of entries.

An entry is an item meant to contain an URI with content relative to the associated subject along with associated information.

manualDirectory =
  Directory
    { dSubject = haskell
    , dEntries =
        [ Entry
            { eURI =
                Just "https://bartoszmilewski.com/2013/06/19/basics-of-haskell/"
            , eGUID = Just "basics-of-haskell"
            , eTitle = Just "Basics of Haskell"
            , eDescription = Just "Introductory material for Haskell"
            , eAuthor = Just "Bartosz Milewski"
            , ePublishDate = Just (read "2013-06-19 14:14:00" :: LocalTime)
            }
        ]
    }

Fetchers

Of course, building list of entries by hand is not very useful. Fetchers are functions which usually reach the outside world to return a list of entries and which can throw an error.

Any fetcher can be used, but Follow tries to ship with common ones. Right now there are two fetchers available:

  • Feed: Take entries from a RSS or Atom feed.
  • Web Scraping: Take entries scraping the HTML of a web page.

The function directoryFromFetched can be used to glue a subject with some fetched content:

import qualified Follow.Fetchers.Feed as Feed

directory =
  directoryFromFetched (Feed.fetch "https://bartoszmilewski.com/feed/") haskell

Middlewares

Fetched content may need some further processing in order to fit what is actually desired. A middleware is a function which transforms a directory into another directory, allowing us to do any kind of transformation.

The aim of Follow is to provide some common middlewares. For now, there are these ones:

  • Filter: Filter entries according some predicate.
  • Sort: Sort entries.
  • Decode: Decodes entries from UTF8 or other encodings.
import qualified Follow.Middlewares.Sort as Sort

sortedDirectory =
  Sort.apply (Sort.byGetter eTitle) <$> directory

Digesters

Once you have your distillate content, you need some way to consume it. A Digester is a function which transforms a directory into anything that can be consumed by an end user.

As before, Follow aims to provide useful ones out of the box. Right now the following are available:

import qualified Follow.Digesters.SimpleText as SimpleText

content = SimpleText.digest <$> sortedDirectory

Now, for example, you are ready to save the content to a file:

join $ T.writeFile "/your/path/haskell.txt" <$> content

Recipes: Combining sources and middlewares

Content is not limited to be fetched from a single source. Instead, a directory can be built merging the entries fetched from different sources. Also, the stack of middlewares to be applied to each source can be given in a single shot.

This whole process specification is called a Recipe, and it contains all the information needed to follow a subject.

To build the recipe you need to provide three fields:

  • The subject being followed.
  • A list of two field tuples where:
    • First field is some fetched content.
    • Second field is a list of middlewares to apply to the fetched content in the first field.
  • A list of middlewares to apply to the directory resulted after applying the list of fetched/middlewares.
haskellRecipe =
  Recipe
    { rSubject = haskell
    , rSteps =
        [ ( Feed.fetch "https://bartoszmilewski.com/feed/"
          , [Sort.apply (Sort.byGetter eTitle)])
        , (Feed.fetch "https://planet.haskell.org/rss20.xml", [])
        ]
    , rMiddlewares = []
    }

You can combine the function directoryFromRecipe and some digester to quickly consume a recipe:

SimpleText.digest <$> directoryFromRecipe haskellRecipe

Collecting recipes

One nice thing in Follow is that you don't need to create the recipes programmatically each time you need them. Instead, you can store them in a YAML file and just parse them when you need.

For example, the previous recipe can be represented in a file recipe.yml as the following:

subject:
  title: Haskell
  description: Some resources about Haskell
  tags: [haskell, programming]
steps:
  -
    - type: feed
      options:
        url: "https://bartoszmilewski.com/feed/"
    -
      - type: sort
        options:
          function:
            type: by_field
            options:
              field: title
middlewares: []

You can use now decode functions in Data.Yaml to get the recipe back:

recipe' <- decodeFileThrow "/your/path/recipe.yml" :: IO (Recipe IO)
directory' = directoryFromRecipe recipe'

Look at src/Follow/Parser.hs for details about encoding each kind of fetcher and middleware.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/waiting-for-dev/follow. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The package is available as open source under the terms of the GNU LGPLv3 License.