#haskell asked whether you could recover a function's name from its value, i.e.
GHCi> name id "id" GHCi> name map "map"
This is easy in some languages. But Haskell is not designed to provide this kind of run-time information. We'll need some non-portable hacks. I tested this code with GHC 6.12.1 on
amd64 Linux; see below for portability notes.
How it works
Closures exist to store data. But the run-time system also needs operational information about each closure: how to garbage-collect it, how to force its evaluation, etc. This information is known at compile time and is shared between many closures. All algebraic values with the same constructor will share this information, as will all function values created from the same lambda in the program's source.
So each closure stores a pointer to an info table, holding this operational information. Info tables are generated at compile time, and stored as part of an executable's read-only data section. This means that they have statically-known addresses, with associated names in the executable's symbol table. We'll use these symbol names to name our functions.
We can dump an executable's symbol table with
$ nm -f posix foo ... ghczmprim_GHCziBool_Bool_closure_tbl D 0000000000749978 ghczmprim_GHCziBool_False_closure D 0000000000749970 ghczmprim_GHCziBool_False_static_info T 00000000004e4dd8 ghczmprim_GHCziBool_True_closure D 0000000000749990 ghczmprim_GHCziBool_True_static_info T 00000000004e4d80 ghczmprim_GHCziDebug_debugErrLn1_closure D 00000000007499a0 ghczmprim_GHCziDebug_debugErrLn1_info T 00000000004e4ec0 ...
Haskell identifiers can contain characters not allowed in symbol names. GHC uses a name-mangling scheme to build symbol names. For example, the first symbol above decodes to
Reading the symbols
We'll also use the GHC API to un-mangle symbol names. GHC is a 20-year effort that has evolved alongside the Haskell language. It follows some legacy conventions like a mostly-flat module hierarchy. So the module we need is named simply
import Control.Parallel ( pseq )
import qualified Data.Map as Map
import qualified System.Posix.Files as Posix
import qualified System.Process as Proc
import qualified Foreign.Ptr as Ptr
import qualified GHC.Vacuum as Vac
import qualified Encoding as GHC
nm as a subprocess and parse its output:
type Symbols = Map.Map Word String
getSymbols :: IO Symbols
getSymbols = do
exe <- Posix.readSymbolicLink "/proc/self/exe"
out <- Proc.readProcess "nm" ["-f", "posix", exe] ""
let offset = 0x10
let f (sym:_:addr:_) = Just (read ("0x"++addr) - offset, GHC.zDecodeString sym)
f _ = Nothing
return . Map.fromList . catMaybes . map (f . words) . lines $ out
We're using the Linux
proc filesystem to get a symbolic link to our application's executable.
The symbols in memory appear at an address
0x10 = 16 bytes or 2 machine words lower than in the executable's symbol table. I'm not sure why; perhaps it's because of GHC's "tables next to code" optimization.
Resolving a symbol
Once we have the symbol table, looking up a value is relatively easy:
name :: Symbols -> a -> String
name syms x = fromMaybe unk $ Map.lookup ptr syms where
ptr = x `pseq` (fromIntegral . Ptr.ptrToWordPtr . Vac.getInfoPtr $ x)
unk = printf "<unknown info table at 0x%016x>" ptr
We use vacuum to get the value's info table pointer, convert this to a
Word, then look it up in the symbol table.
We explicitly evaluate
pseq, to avoid seeing a thunk.
We'll test with
$ ghc --make name.hs -package ghc $ ./name
Each test below is commented with the expected output. First, let's try a few non-function values:
main :: IO ()
main = do
syms <- getSymbols
let test = putStrLn . name syms
test 3 -- integer-gmp_GHC.Integer.Type_S#_con_info
test (3 :: Int) -- ghc-prim_GHC.Types_I#_static_info
test "xyz" -- ghc-prim_GHC.Types_:_con_info
GHC defaults to
3 :: Integer, as
-Wall will tell you. As we see,
Int are both implemented as algebraic data:
= S# Int#
| J# Int# ByteArray#
data Int = I# Int#
"xyz" is a list built out of
Next let's try a few functions:
test map -- base_GHC.Base_map_info
test getChar -- base_System.IO_getChar_info
test (+) -- integer-gmp_GHC.Integer_plusInteger_info
(+) defaults to operating on
Integer, and GHC inlines the type class dictionary, giving us the underlying
Now let's see the limits of this technique:
test (\_ -> 'x') -- s1jD_info
test (const 'x') -- stg_PAP_info
test test -- stg_PAP_info
Our lambda expression gets a useless compiler-generated name. The application of
const is worse; it uses an info table common to all partial applications. However, we could use vacuum to follow the fields of the
PAP closure, which I'll leave as an exercise to the reader. ;)
test itself is also a partial application. It's defined by applying two arguments to the function
(.) defined as
(.) f g x = f (g x)
If we eta-expand
let test x = putStrLn $ name syms x
then we'll get another compiler-generated name like
This is a hack, and probably not suitable for any serious purpose. Shelling out to
nm to get a symbol table is particularly ugly. I tried to use bindings to BFD, but ran into some segfaults.
The above code will work only on 64-bit machines, but could be adapted for 32-bit. I bet the magic offset would change. It works on GHC 6.12.1, and should work on other recent versions, if you can get vacuum to build.
It definitely requires a Unix system, and specifically Linux or something emulating Linux's
proc filesystem. You'll need
nm from GNU Binutils, which is standard on a system configured for C development.
It won't work if you run
strip on your binaries...