@@ -368,6 +368,9 @@ data IntMap a = Bin {-# UNPACK #-} !Prefix
368
368
-- All keys in the left child of a Bin have the mask bit unset, and all keys
369
369
-- in the right child have the mask bit set.
370
370
371
+ -- See Note [Okasaki-Gill] for how the implementation here relates to the one in
372
+ -- Okasaki and Gill's paper.
373
+
371
374
-- | A @Prefix@ is some prefix of high-order bits of an @Int@.
372
375
--
373
376
-- This is represented by an @Int@ which starts with the prefix bits,
@@ -3725,3 +3728,47 @@ node = "+--"
3725
3728
withBar , withEmpty :: [String ] -> [String ]
3726
3729
withBar bars = " | " : bars
3727
3730
withEmpty bars = " " : bars
3731
+
3732
+ {- -------------------------------------------------------------------
3733
+ Notes
3734
+ --------------------------------------------------------------------}
3735
+
3736
+ -- Note [Okasaki-Gill]
3737
+ -- ~~~~~~~~~~~~~~~~~~~
3738
+ --
3739
+ -- The IntMap structure is based on the map described in the paper "Fast
3740
+ -- Mergeable Integer Maps" by Chris Okasaki and Andy Gill, with some
3741
+ -- differences.
3742
+ --
3743
+ -- The paper spends most of its time describing a little-endian tree, where the
3744
+ -- branching is done first on low bits then high bits. It then briefly describes
3745
+ -- a big-endian tree. The implementation here is big-endian.
3746
+ --
3747
+ -- The definition of Okasaki and Gill's map would be written in Haskell as
3748
+ --
3749
+ -- data Dict a
3750
+ -- = Empty
3751
+ -- | Lf !Int a
3752
+ -- | Br !Int !Int !(Dict a) !(Dict a)
3753
+ --
3754
+ -- Empty is the same as IntMap's Nil, and Lf is the same as Tip.
3755
+ --
3756
+ -- In Br, the first Int is the shared prefix and the second is the mask bit by
3757
+ -- itself. For the big-endian map, the paper suggests that the prefix be the
3758
+ -- common prefix, followed by a 0-bit, followed by all 1-bits. This is so that
3759
+ -- the prefix value can be used as a point of split for binary search.
3760
+ --
3761
+ -- IntMap's Bin corresponds to Br, but is different because it has only one
3762
+ -- Int (newtyped as Prefix). This describes both prefix and mask, so it is not
3763
+ -- necessary to store them separately. This value is, in fact, one plus the
3764
+ -- value suggested for the prefix in the paper. This representation is chosen
3765
+ -- because it saves one word per Bin without detriment to the efficiency of
3766
+ -- operations.
3767
+ --
3768
+ -- The implementation of operations such as lookup, insert, union, follow
3769
+ -- the described implementations on Dict and split into the same cases. For
3770
+ -- instance, for insert, the three cases on a Br are whether the key belongs
3771
+ -- outside the map, or it belongs in the left child, or it belongs in the
3772
+ -- right child. We have the same three cases for a Bin. However, the bitwise
3773
+ -- operations we use to determine the case is naturally different due to the
3774
+ -- difference in representation.
0 commit comments