Skip to content

Commit a7da708

Browse files
committed
Add note on the relation to the Okasaki-Gill paper
1 parent 761fbda commit a7da708

File tree

1 file changed

+47
-0
lines changed

1 file changed

+47
-0
lines changed

containers/src/Data/IntMap/Internal.hs

+47
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,9 @@ data IntMap a = Bin {-# UNPACK #-} !Prefix
368368
-- All keys in the left child of a Bin have the mask bit unset, and all keys
369369
-- in the right child have the mask bit set.
370370

371+
-- See Note [Okasaki-Gill] for how the implementation here relates to the one in
372+
-- Okasaki and Gill's paper.
373+
371374
-- | A @Prefix@ is some prefix of high-order bits of an @Int@.
372375
--
373376
-- This is represented by an @Int@ which starts with the prefix bits,
@@ -3725,3 +3728,47 @@ node = "+--"
37253728
withBar, withEmpty :: [String] -> [String]
37263729
withBar bars = "| ":bars
37273730
withEmpty bars = " ":bars
3731+
3732+
{--------------------------------------------------------------------
3733+
Notes
3734+
--------------------------------------------------------------------}
3735+
3736+
-- Note [Okasaki-Gill]
3737+
-- ~~~~~~~~~~~~~~~~~~~
3738+
--
3739+
-- The IntMap structure is based on the map described in the paper "Fast
3740+
-- Mergeable Integer Maps" by Chris Okasaki and Andy Gill, with some
3741+
-- differences.
3742+
--
3743+
-- The paper spends most of its time describing a little-endian tree, where the
3744+
-- branching is done first on low bits then high bits. It then briefly describes
3745+
-- a big-endian tree. The implementation here is big-endian.
3746+
--
3747+
-- The definition of Okasaki and Gill's map would be written in Haskell as
3748+
--
3749+
-- data Dict a
3750+
-- = Empty
3751+
-- | Lf !Int a
3752+
-- | Br !Int !Int !(Dict a) !(Dict a)
3753+
--
3754+
-- Empty is the same as IntMap's Nil, and Lf is the same as Tip.
3755+
--
3756+
-- In Br, the first Int is the shared prefix and the second is the mask bit by
3757+
-- itself. For the big-endian map, the paper suggests that the prefix be the
3758+
-- common prefix, followed by a 0-bit, followed by all 1-bits. This is so that
3759+
-- the prefix value can be used as a point of split for binary search.
3760+
--
3761+
-- IntMap's Bin corresponds to Br, but is different because it has only one
3762+
-- Int (newtyped as Prefix). This describes both prefix and mask, so it is not
3763+
-- necessary to store them separately. This value is, in fact, one plus the
3764+
-- value suggested for the prefix in the paper. This representation is chosen
3765+
-- because it saves one word per Bin without detriment to the efficiency of
3766+
-- operations.
3767+
--
3768+
-- The implementation of operations such as lookup, insert, union, follow
3769+
-- the described implementations on Dict and split into the same cases. For
3770+
-- instance, for insert, the three cases on a Br are whether the key belongs
3771+
-- outside the map, or it belongs in the left child, or it belongs in the
3772+
-- right child. We have the same three cases for a Bin. However, the bitwise
3773+
-- operations we use to determine the case is naturally different due to the
3774+
-- difference in representation.

0 commit comments

Comments
 (0)