0 0 0 0


This question: How to de-interleave bits (UnMortonizing?) has a good answer for extracting one of the two halves of a Morton number (just the odd bits), but I need a solution which extracts both parts (the odd bits and the even bits) in as few operations as possible.

For my use I would need to take a 32 bit int and extract two 16 bit ints, where one is the even bits and the other is the odd bits shifted right by 1 bit, e.g.

input,  z: 11101101 01010111 11011011 01101110
output, x: 11100001 10110111 // odd bits shifted right by 1
        y: 10111111 11011010 // even bits

There seem to be plenty of solutions using shifts and masks with magic numbers for generating Morton numbers (i.e. interleaving bits), e.g. Interleave bits by Binary Magic Numbers, but I haven't yet found anything for doing the reverse (i.e. de-interleaving).


After re-reading the section from Hacker's Delight on perfect shuffles/unshuffles I found some useful examples which I adapted as follows:

// morton1 - extract even bits
uint32_t morton1(uint32_t x)
    x = x & 0x55555555;
    x = (x | (x >> 1)) & 0x33333333;
    x = (x | (x >> 2)) & 0x0F0F0F0F;
    x = (x | (x >> 4)) & 0x00FF00FF;
    x = (x | (x >> 8)) & 0x0000FFFF;
    return x;
// morton2 - extract odd and even bits
void morton2(uint32_t *x, uint32_t *y, uint32_t z)
    *x = morton1(z);
    *y = morton1(z >> 1);

I think this can still be improved on, both in its current scalar form and also by taking advantage of SIMD, so I'm still interested in better solutions (either scalar or SIMD).

Best Answer:

I didn't want to be limited to a fixed size integer and making lists of similar commands with hardcoded constants, so I developed a C++11 solution which makes use of template metaprogramming to generate the functions and the constants. The assembly code generated with -O3 seems as tight as it can get without using BMI:

andl    $0x55555555, %eax
movl    %eax, %ecx
shrl    %ecx
orl     %eax, %ecx
andl    $0x33333333, %ecx
movl    %ecx, %eax
shrl    $2, %eax
orl     %ecx, %eax
andl    $0xF0F0F0F, %eax
movl    %eax, %ecx
shrl    $4, %ecx
orl     %eax, %ecx
movzbl  %cl, %esi
shrl    $8, %ecx
andl    $0xFF00, %ecx
orl     %ecx, %esi

TL;DRsource repo and live demo.


Basically every step in the morton1 function works by shifting and adding to a sequence of constants which look like this:

  • 0b0101010101010101 (alternate 1 and 0)
  • 0b0011001100110011 (alternate 2x 1 and 0)
  • 0b0000111100001111 (alternate 4x 1 and 0)
  • 0b0000000011111111 (alternate 8x 1 and 0)
  • If we were to use D dimensions, we would have a pattern with D-1 zeros and 1 one. So to generate these it's enough to generate consecutive ones and apply some bitwise or:

    /// @brief Generates 0b1...1 with @tparam n ones
    template <class T, unsigned n>
    using n_ones = std::integral_constant<T, (~static_cast<T>(0) >> (sizeof(T) * 8 - n))>;
    /// @brief Performs `@tparam input | (@tparam input << @tparam width` @tparam repeat times.
    template <class T, T input, unsigned width, unsigned repeat>
    struct lshift_add :
        public lshift_add<T, lshift_add<T, input, width, 1>::value, width, repeat - 1> {
    /// @brief Specialization for 1 repetition, just does the shift-and-add operation.
    template <class T, T input, unsigned width>
    struct lshift_add<T, input, width, 1> : public std::integral_constant<T,
        (input & n_ones<T, width>::value) | (input << (width < sizeof(T) * 8 ? width : 0))> {

    Now that we can generate the constants at compile time for arbitrary dimensions with the following:

    template <class T, unsigned step, unsigned dimensions = 2u>
    using mask = lshift_add<T, n_ones<T, 1 << step>::value, dimensions * (1 << step), sizeof(T) * 8 / (2 << step)>;

    With the same type of recursion, we can generate functions for each of the steps of the algorithm x = (x | (x >> K)) & M:

    template <class T, unsigned step, unsigned dimensions>
    struct deinterleave {
        static T work(T input) {
            input = deinterleave<T, step - 1, dimensions>::work(input);
            return (input | (input >> ((dimensions - 1) * (1 << (step - 1))))) & mask<T, step, dimensions>::value;
    // Omitted specialization for step 0, where there is just a bitwise and

    It remains to answer the question "how many steps do we need?". This depends also on the number of dimensions. In general, k steps compute 2^k - 1 output bits; the maximum number of meaningful bits for each dimension is given by z = sizeof(T) * 8 / dimensions, therefore it is enough to take 1 + log_2 z steps. The problem is now that we need this as constexpr in order to use it as a template parameter. The best way I found to work around this is to define log2 via metaprogramming:

    template <unsigned arg>
    struct log2 : public std::integral_constant<unsigned, log2<(arg >> 1)>::value + 1> {};
    template <>
    struct log2<1u> : public std::integral_constant<unsigned, 0u> {};
    /// @brief Helper constexpr which returns the number of steps needed to fully interleave a type @tparam T.
    template <class T, unsigned dimensions>
    using num_steps = std::integral_constant<unsigned, log2<sizeof(T) * 8 / dimensions>::value + 1>;

    And finally, we can perform one single call:

    /// @brief Helper function which combines @see deinterleave and @see num_steps into a single call.
    template <class T, unsigned dimensions>
    T deinterleave_first(T n) {
        return deinterleave<T, num_steps<T, dimensions>::value - 1, dimensions>::work(n);

    Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs