@hackage melf1.0.2

An Elf parser

melf

A Haskell library to parse/serialize Executable and Linkable Format (ELF) files.

Parsing the header and table entries

Module Data.Elf.Headers implements parsing and serialization of the ELF file header and the entries of section and segment tables.

ELF files come in two flavors: 64-bit and 32-bit. To differentiate between them type ElfClass is defined:

data ElfClass
    = ELFCLASS32 -- ^ 32-bit ELF format
    | ELFCLASS64 -- ^ 64-bit ELF format
    deriving (Eq, Show)

Some fields of the header and table entries have different bitwidth for 64-bit and 32-bit files. So the type WordXX a was borrowed from the data-elf package:

-- | @IsElfClass a@ is defined for each constructor of `ElfClass`.
--   It defines @WordXX a@, which is `Word32` for `ELFCLASS32`
--   and `Word64` for `ELFCLASS64`.
class ( SingI c
      , Typeable c
      , Typeable (WordXX c)
      , Data (WordXX c)
      , Show (WordXX c)
      , Read (WordXX c)
      , Eq (WordXX c)
      , Ord (WordXX c)
      , Bounded (WordXX c)
      , Enum (WordXX c)
      , Num (WordXX c)
      , Integral (WordXX c)
      , Real (WordXX c)
      , Bits (WordXX c)
      , FiniteBits (WordXX c)
      , Binary (Be (WordXX c))
      , Binary (Le (WordXX c))
      ) => IsElfClass c where
    type WordXX c = r | r -> c

instance IsElfClass 'ELFCLASS32 where
    type WordXX 'ELFCLASS32 = Word32

instance IsElfClass 'ELFCLASS64 where
    type WordXX 'ELFCLASS64 = Word64

The header of the ELF file is represented with the type HeaderXX a:

-- | Parsed ELF header
data HeaderXX c =
    HeaderXX
        { hData       :: ElfData    -- ^ Data encoding (big- or little-endian)
        , hOSABI      :: ElfOSABI   -- ^ OS/ABI identification
        , hABIVersion :: Word8      -- ^ ABI version
        , hType       :: ElfType    -- ^ Object file type
        , hMachine    :: ElfMachine -- ^ Machine type
        , hEntry      :: WordXX c   -- ^ Entry point address
        , hPhOff      :: WordXX c   -- ^ Program header offset
        , hShOff      :: WordXX c   -- ^ Section header offset
        , hFlags      :: Word32     -- ^ Processor-specific flags
        , hPhEntSize  :: Word16     -- ^ Size of program header entry
        , hPhNum      :: Word16     -- ^ Number of program header entries
        , hShEntSize  :: Word16     -- ^ Size of section header entry
        , hShNum      :: Word16     -- ^ Number of section header entries
        , hShStrNdx   :: ElfSectionIndex -- ^ Section name string table index
        }

So we have two types HeaderXX 'ELFCLASS64 and HeaderXX 'ELFCLASS32. To be able to work with headers uniformly the type Header was introduced:

-- | Sigma type where `ElfClass` defines the type of `HeaderXX`
type Header = Sigma ElfClass (TyCon1 HeaderXX)

Header is a pair. The first element is an object of the type ElfClass defining the width of the word. The second element is HeaderXX parametrized with the first element (i. e. Σ-type from the languages with dependent types). To simulate Σ-types the library singletons (Hackage, "Introduction to singletons") was used.

Header is an instance of the Binary class.

So given a lazy bytestring containing large enough initial part of ELF file one can get the header of that file with a function like this:

withHeader ::                     BSL.ByteString ->
    (forall a . IsElfClass a => HeaderXX a -> b) -> Either String b
withHeader bs f =
    case decodeOrFail bs of
        Left (_, _, err) -> Left err
        Right (_, _, (classS :&: hxx) :: Header) ->
            Right $ withElfClass classS f hxx

The function decodeOrFail is defined in the package binary. The function withElfClass creates a context with an implicit word width available and looks like withSingI:

-- | Convenience function for creating a
-- context with an implicit ElfClass available.
withElfClass :: Sing c -> (IsElfClass c => a) -> a
withElfClass SELFCLASS64 x = x
withElfClass SELFCLASS32 x = x

The module Data.Elf.Headers also defines the types SectionXX, SegmentXX and SymbolXX for the elements of section, segment and symbol tables.

Parsing the whole ELF file

The module Data.Elf implements parsing and serialization of the whole ELF files. To parse ELF file it reads ELF header, section table and segment table and uses that data to create a list of elements of the type ElfXX representing the recursive structure of the ELF file. It also restores section names from the the string table indexes. That results in creating an object of type Elf:

-- | `Elf` is a forrest of trees of type `ElfXX`.
-- Trees are composed of `ElfXX` nodes, `ElfSegment` can contain subtrees
newtype ElfList c = ElfList [ElfXX c]

-- | Elf is a sigma type where `ElfClass` defines the type of `ElfList`
type Elf = Sigma ElfClass (TyCon1 ElfList)

-- | Section data may contain a string table.
-- If a section contains a string table with section names, the data
-- for such a section is generated and `esData` should contain `ElfSectionDataStringTable`
data ElfSectionData
    = ElfSectionData BSL.ByteString -- ^ Regular section data
    | ElfSectionDataStringTable     -- ^ Section data will be generated from section names

-- | The type of node that defines Elf structure.
data ElfXX (c :: ElfClass)
    = ElfHeader
        { ehData       :: ElfData    -- ^ Data encoding (big- or little-endian)
        , ehOSABI      :: ElfOSABI   -- ^ OS/ABI identification
        , ehABIVersion :: Word8      -- ^ ABI version
        , ehType       :: ElfType    -- ^ Object file type
        , ehMachine    :: ElfMachine -- ^ Machine type
        , ehEntry      :: WordXX c   -- ^ Entry point address
        , ehFlags      :: Word32     -- ^ Processor-specific flags
        }
    | ElfSectionTable
    | ElfSegmentTable
    | ElfSection
        { esName      :: String         -- ^ Section name (NB: string, not offset in the string table)
        , esType      :: ElfSectionType -- ^ Section type
        , esFlags     :: ElfSectionFlag -- ^ Section attributes
        , esAddr      :: WordXX c       -- ^ Virtual address in memory
        , esAddrAlign :: WordXX c       -- ^ Address alignment boundary
        , esEntSize   :: WordXX c       -- ^ Size of entries, if section has table
        , esN         :: ElfSectionIndex -- ^ Section number
        , esInfo      :: Word32         -- ^ Miscellaneous information
        , esLink      :: Word32         -- ^ Link to other section
        , esData      :: ElfSectionData -- ^ The content of the section
        }
    | ElfSegment
        { epType       :: ElfSegmentType -- ^ Type of segment
        , epFlags      :: ElfSegmentFlag -- ^ Segment attributes
        , epVirtAddr   :: WordXX c       -- ^ Virtual address in memory
        , epPhysAddr   :: WordXX c       -- ^ Physical address
        , epAddMemSize :: WordXX c       -- ^ Add this amount of memory after the section when the section is loaded to memory by execution system.
                                         --   Or, in other words this is how much `pMemSize` is bigger than `pFileSize`
        , epAlign      :: WordXX c       -- ^ Alignment of segment
        , epData       :: [ElfXX c]      -- ^ Content of the segment
        }
    | ElfRawData -- ^ Some ELF files (some executables) don't bother to define
                 -- sections for linking and have just raw data in segments.
        { edData :: BSL.ByteString -- ^ Raw data in ELF file
        }
    | ElfRawAlign -- ^ Align the next data in the ELF file.
                  -- The offset of the next data in the ELF file
                  -- will be the minimal @x@ such that
                  -- @x mod eaAlign == eaOffset mod eaAlign @
        { eaOffset :: WordXX c -- ^ Align value
        , eaAlign  :: WordXX c -- ^ Align module
        }

Not each object of that type can be serialized.

  • Constructor ElfSection still has a section number. It is required as the symbol table and some other structures refer to the sections by theirs indexes. So the section indexes should be consecutive integers starting from 1. Section with index 0 is always empty and is created by the library.

  • There should be a single ElfHeader. It should be the first nonempty node of the tree.

  • If there exists at least one node ElfSection then there should exist exactly one node ElfSectionTable and exactly one section that has ElfSectionDataStringTable as the value of its esData field (the string table for the names of sections).

  • If there exists at least one node ElfSegment then there should exist exactly one node ElfSegmentTable.

Correctly composed ELF object can be serialized with the function serializeElf and parsed with the function parseElf:

serializeElf :: MonadThrow m => Elf -> m ByteString
parseElf :: MonadCatch m => ByteString -> m Elf

ELF is not an instance of the class Binary because PutM is not an instance of the class MonadFail.

Generation of object files

To create machine code that is used in the examples a pair of modules were created. The module AsmAArch64 provides a DSL embedded in Haskell. This DSL is a kind of assembler language for the AArch64 platform. It exports some primitives to generate machine instructions and organize machine code. It also exports function assemble that consumes the monad composed of those primitives and produces an object of the type Elf:

assemble :: MonadCatch m => StateT CodeState m () -> m Elf

The idea was inspired by the article (Stephen Diehl "Monads to Machine Code"). Detailed description of this module is available in russian: README_ru.md.

The module HelloWorld uses primitives from AsmAArch64 to compose relocatable executable code that uses system calls to output a "Hello World!" message into standard output and exit:

helloWorld :: MonadCatch m => StateT CodeState m ()

Function assemble uses the melf library to generate an object file:

    return $ SELFCLASS64 :&: ElfList
        [ ElfHeader
            { ehData       = ELFDATA2LSB
            , ehOSABI      = ELFOSABI_SYSV
            , ehABIVersion = 0
            , ehType       = ET_REL
            , ehMachine    = EM_AARCH64
            , ehEntry      = 0
            , ehFlags      = 0
            }
        , ElfSection
            { esName      = ".text"
            , esType      = SHT_PROGBITS
            , esFlags     = SHF_EXECINSTR .|. SHF_ALLOC
            , esAddr      = 0
            , esAddrAlign = 8
            , esEntSize   = 0
            , esN         = textSecN
            , esLink      = 0
            , esInfo      = 0
            , esData      = ElfSectionData txt
            }
        , ElfSection
            { esName      = ".shstrtab"
            , esType      = SHT_STRTAB
            , esFlags     = 0
            , esAddr      = 0
            , esAddrAlign = 1
            , esEntSize   = 0
            , esN         = shstrtabSecN
            , esLink      = 0
            , esInfo      = 0
            , esData      = ElfSectionDataStringTable
            }
        , ElfSection
            { esName      = ".symtab"
            , esType      = SHT_SYMTAB
            , esFlags     = 0
            , esAddr      = 0
            , esAddrAlign = 8
            , esEntSize   = symbolTableEntrySize ELFCLASS64
            , esN         = symtabSecN
            , esLink      = fromIntegral strtabSecN
            , esInfo      = 1
            , esData      = ElfSectionData symbolTableData
            }
        , ElfSection
            { esName      = ".strtab"
            , esType      = SHT_STRTAB
            , esFlags     = 0
            , esAddr      = 0
            , esAddrAlign = 1
            , esEntSize   = 0
            , esN         = strtabSecN
            , esLink      = 0
            , esInfo      = 0
            , esData      = ElfSectionData stringTableData
            }
        , ElfSectionTable
        ]

It runs the State monad that was passed as an argument. As a result the final state of CodeState includes all the data neсessary to produce ELF file, in particular:

  • txt refers to the content of the .text section,
  • symbolTableData refers to the content of the symbol table section,
  • stringTableData refers to the content of the string table section linked to the symbol table.

Names with SecN suffixes (textSecN, shstrtabSecN, symtabSecN, strtabSecN) are predefined section numbers that conform to the conditions stated above.

For the sake of simplicity external symbol resolution and data section allocation were not implemented. It requires implementation of relocation tables. On the other hand, the resulting code is position-independent.

Use this module to produce object file and try to link it:

[nix-shell:examples]$ ghci 
GHCi, version 8.10.7: https://www.haskell.org/ghc/  :? for help
Prelude> :l AsmAArch64.hs HelloWorld.hs 
[1 of 2] Compiling AsmAArch64       ( AsmAArch64.hs, interpreted )
[2 of 2] Compiling HelloWorld       ( HelloWorld.hs, interpreted )
Ok, two modules loaded.
*AsmAArch64> import HelloWorld
*AsmAArch64 HelloWorld> elf <- assemble helloWorld
*AsmAArch64 HelloWorld> bs <- serializeElf elf
*AsmAArch64 HelloWorld> BSL.writeFile "helloWorld.o" bs
*AsmAArch64 HelloWorld> 
Leaving GHCi.

[nix-shell:examples]$ aarch64-unknown-linux-gnu-gcc -nostdlib helloWorld.o -o helloWorld

[nix-shell:examples]$ 

The linker accepted the object file. Try to run the result:

[nix-shell:examples]$ qemu-aarch64 helloWorld
Hello World!

[nix-shell:examples]$ 

It works.

Generation of executable files

The module DummyLd uses the section .text of object file to create an executable file. Code relocation and symbol resolution is not implemented so that procedure works only for position-independent code that does not refer to external translation units, for example, it works with the code described above.

Function dummyLd consumes an object of the type Elf and finds a section .text (using elfFindSectionByName) and header (using elfFindHeader) in it. Then the header type is changed to ET_EXEC, the address of the first executable instruction is specified and a loadable segment containing the header and the content of .text is formed:

data MachineConfig (a :: ElfClass)
    = MachineConfig
        { mcAddress :: WordXX a -- ^ Virtual address of the executable segment
        , mcAlign   :: WordXX a -- ^ Required alignment of the executable segment
                                --   in physical memory (depends on max page size)
        }

getMachineConfig :: (IsElfClass a, MonadThrow m) => ElfMachine -> m (MachineConfig a)
getMachineConfig EM_AARCH64 = return $ MachineConfig 0x400000 0x10000
getMachineConfig EM_X86_64  = return $ MachineConfig 0x400000 0x1000
getMachineConfig _          = $chainedError "could not find machine config for this arch"

dummyLd' :: forall a m . (MonadThrow m, IsElfClass a) => ElfList a -> m (ElfList a)
dummyLd' (ElfList es) = do

    txtSection <- elfFindSectionByName es ".text"
    txtSectionData <- case txtSection of
        ElfSection { esData = ElfSectionData textData } -> return textData
        _ -> $chainedError "could not find correct \".text\" section"

    header <- elfFindHeader es
    case header of
        ElfHeader { .. } -> do
            MachineConfig { .. } <- getMachineConfig ehMachine
            return $ ElfList
                [ ElfSegment
                    { epType       = PT_LOAD
                    , epFlags      = PF_X .|. PF_R
                    , epVirtAddr   = mcAddress
                    , epPhysAddr   = mcAddress
                    , epAddMemSize = 0
                    , epAlign      = mcAlign
                    , epData       =
                        [ ElfHeader
                            { ehType  = ET_EXEC
                            , ehEntry = mcAddress + headerSize (fromSing $ sing @a)
                            , ..
                            }
                        , ElfRawData
                            { edData = txtSectionData
                            }
                        ]
                    }
                , ElfSegmentTable
                ]
        _ -> $chainedError "could not find ELF header"

-- | @dummyLd@ places the content of ".text" section of the input ELF
-- into the loadable segment of the resulting ELF.
-- This could work if there are no relocations or references to external symbols.
dummyLd :: MonadThrow m => Elf -> m Elf
dummyLd (c :&: l) = (c :&:) <$> withElfClass c dummyLd' l

Try to use this code to produce executable file without GNU linker:

[nix-shell:examples]$ ghci
GHCi, version 8.10.7: https://www.haskell.org/ghc/  :? for help
Prelude> :l DummyLd.hs
[1 of 1] Compiling DummyLd          ( DummyLd.hs, interpreted )
Ok, one module loaded.
*DummyLd> import Data.ByteString.Lazy as BSL
*DummyLd BSL> i <- BSL.readFile "helloWorld.o"
*DummyLd BSL> elf <- parseElf i
*DummyLd BSL> elf' <- dummyLd elf
*DummyLd BSL> o <- serializeElf elf'
*DummyLd BSL> BSL.writeFile "helloWorld2" o
*DummyLd BSL> 
Leaving GHCi.

[nix-shell:examples]$ chmod +x helloWorld2

[nix-shell:examples]$ qemu-aarch64 helloWorld2
Hello World!

[nix-shell:examples]$ 

It works.

These just parse/serialize ELF header and table entries but not the whole ELF files.

History

For the early history look at the branch "amakarov" of the my copy of the elf repo.

Tests

Test data is committed with git-lfs. Only testdata/orig/* tests are included to hackage distributive to keep the tarball size small.

License

BSD 3-Clause License (c) Aleksey Makarov