1.8x faster Integer printing algorithm

Tigran Hayrapetyan
Cantor’s Paradise
35 min readMay 3, 2024

--

A new algorithm, which does no division or remainder calculation

1. Introduction

It is a well known fact that everything, including numbers, is represented in computer memory as various sequences of 0s and 1s. For example,

number 34 is represented as “100010”,
and number 14 is stored as “1110”.

So when any integer is being printed to the screen or into a text file, a need arises to convert it into familiar representation of decimal digits — from ‘0’ to ‘9’.

Such operation is called “integer to string conversion”, and should be done every time before printing an integer. Currently few algorithms exist for doing such conversion.

In this story I am going to:

  • Describe the standard algorithm for converting integers to strings (chapter 2),
  • Walk over its existing optimizations (chapter 3),
  • Present my new algorithm, which addresses integers printing in a completely novel way (chapters 4 and 5),
  • Describe how my new algorithm can print integers in other bases too (chapter 6),
  • Describe 4 optimizations for the new algorithm (chapter 7),
  • Present experimental comparison of all mentioned algorithms (chapter 8).

Experimental comparisons will show that my new algorithm converts 32-bit integers to strings faster by around 43–45%, compared to the currently standard conversion algorithm, and for 64-bit integers it works faster by around 57–62%.

By the way, this work addresses the same problem as my previous writing “34% faster Integer to String Conversion algorithm” [3]. But here the obtained performance gain is higher.

I should mention that printing integers is not a bottleneck for most of modern computer programs, as generally, only a tiny percent of calculated numbers is being sent to print. Most of numbers and values participate only in intermediate calculations, thus no need arises for converting them into strings.

However, there are some types of programs which automatically generate large text files (like *.csv, *.xml, *.json), which usually contain lots of numbers, written in the familiar decimal notation. So there arises need to convert tons of integers into corresponding strings, and that is when performance of integer to string conversion algorithm becomes important. For example, that can be the case when working in Data Science or Machine Learning fields, and exporting large datasets into text files.

2. The standard printing algorithm

As printing integers is one of the most common operations, which basically is a conversion of integer to a string, corresponding functions are implemented in standard libraries of all modern programming languages.

And the algorithm used in those functions to print input number ’N’ is nearly the same, based on repeatedly obtaining value of the last digit of ’N’, and dropping that digit out, thus making it possible to continue with the remaining part of ‘N’.

In order to obtain the last digit of ’N’ we calculate remainder of its division upon 10:

digit := N mod 10

and in order to drop the last digit, we do the actual division itself:

N := ⌊ N / 10 ⌋

Given an integer N, how its last digit
and the remaining part are being computed.

Note, here and later in this story, when dividing 2 integers, we will consider only the whole part of the result.

Those 2 operations: remainder calculation and integer division, can produce the complete sequence of digits of ’N’, when put in a loop:

Operations for printing number “5173”:
Step 1: 5173 % 10 = 3 (storing digit “3”) , 5173 / 10 = 517 (continuing with 517),
Step 2: 517 % 10 = 7 (storing digit “7”) , 517 / 10 = 51 (continuing with 51),
Step 3: 51 % 10 = 1 (storing digit “1”) , 51 / 10 = 5 (continuing with 5),
Step 4: As “5 < 10”, just storing the last digit “5”.
Step 5: (not illustrated) reversing order of stored digits and printing them.

Note, the algorithm obtains digits of ’N’ in reverse order — from right to left. So if we are going to print them on screen or into a text file, we should also reverse the obtained sequence of digits.

If writing pseudo-code for the standard integer printing algorithm, it might result in something like this:

var result[0 .. 25] : Array of Characters  // Assume 25 characters at most

// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into 'result' array.
procedure print( N: Integer )
i := 0 // Index over 'result' array
while N > 0
result[ i ] := '0' + (N mod 10) // Take the last digit
N := ⌊ N / 10 ⌋ // Pick out the last digit
i := i+1
result[ i ] := '\0' // Append the terminating 'null' character
reverse array result[0 .. i-1]

By the way, in this and all subsequent pseudo-codes, we will not handle the case of printing number “0”. According to the presented pseudo-code, for input N=0 an empty sequence will be produced. That’s why in all printing algorithms, the case of N=0 is handled in a separate branch. We will just skip that branch for compactness of pseudo-codes.

The presented algorithm is simple and can be easily implemented in 3–4 lines of code. The only issue with it is that it uses 2 relatively time-consuming operations when obtaining every digit of ’N’: those are integer division and remainder calculation. The benchmarking presented below illustrates which of the basic arithmetical operations are more time-consuming on modern CPUs:

Experimental comparison of time (in nanoseconds) spent to perform the
8 types of operations, when each is run 200 times on a random data.
The last 2 operations are comparing integers as “a < b” and as “a == b”.
We can see that the integer division and remainder calculation operations
are taking significantly more time, both for 32-bit and for 64-bit case.
Also, we see that integer multiplication is performed almost as fast as addition or subtraction.
Assignment is performed faster than all other, which is pretty expected.

The experiments were made with Google Benchmark, under the following system:

CPU: Intel Core i7–11800H @ 2.30GHz
RAM: 16.0 GB
OS: Windows 11 Home, 64-bit
Compiler: MSVC 2022 ( /O2 /Ob2 /MD /GR /Gd )

So can we do it better? Maybe we can figure out an algorithm which for printing given integer ’N’ will use neither integer division, nor remainder calculation?

3. Existing optimizations

Before describing the new algorithm, I want to briefly walk over the existing optimizations, which can speed-up presented standard algorithm.

Optimization 1

The first optimization is about eliminating the need of the last “reverse” operation. It is well described in [1]. For that, we should write obtained digits of input number ’N’ in the buffer, right in the order that they should have on the screen. And as the standard algorithm produces digits of ’N’ from right to left, we should also write them in the buffer from right to left.

Filling produced digits into result array from right to left,
directly in the order they should have at the end.

To enable this optimization, pseudo-code should be changed insignificantly:

var result[0 .. 25] : Array of Characters  // Assume 25 characters at most

// The function takes integer 'N' to be printed, and returns position
// of its converted first character in the 'result' array.
function print( N: Integer ) : Integer
result[ 25 ] := '\0' // Place the terminating 'null' character at the end
i := 25 // Index over 'result' array
while N > 0
i := i-1 // Here we go to left, for placing the next digit
result[ i ] := '0' + (N mod 10) // Take the last digit
N := ⌊ N / 10 ⌋ // Pick out the last digit
return i // Position from where the converted integer starts

A small drawback of this optimization is that if 2 input numbers have different count of digits, then after writing them in the buffer, their first digits will appear in different positions. So in some sense, it is hard for us to control the starting position of digits of input numbers.

Drawback of optimization 1: numbers with different
count of digits will start in the output array from different positions.

However, practically that is almost never an issue, as obtained digits are being promptly sent to screen or to a text file, and thus do not remain in memory for long time.

Optimization 2

The second optimization is about obtaining 2 digits of input number ’N’, during a single step. It is well described in [1] and [2]. Here, instead of repeatedly calculating remainder of division of ’N’ over 10, we will calculate remainder of its division over 100.

So in order to obtain the last 2 digits of ’N’, we will calculate:

digits := N mod 100

And to drop them both out of ’N’ we will do:

N := ⌊ N / 100 ⌋

Operations for printing number “3249376” with second optimization enabled:
Step 1: 3249376 % 100 = 76 (storing digits “76”) , 3249376 / 100 = 32493 (continuing with 32493),
Step 2: 32493 % 100 = 93 (storing digits “93”) , 32493 / 100 = 324 (continuing with 324),
Step 3: 324 % 100 = 24 (storing digits “24”) , 324 / 100 = 3 (continuing with 3),
Step 4: As “3 < 100”, just storing the last digit “3”

However, in order to be able to efficiently print the obtained 2 digits after having result of “N mod 100”, we should have prepared an array of length 100, with values being the pairs of digits:

[“00”, “01”, “02”, “03”, … , “97”, “98”, “99”].

So indexes of the array will have range [0..99] (which corresponds to all possible remainders “N mod 100”), and values of the array will be pairs of digits, each pair representing printed value of that remainder.

By using this optimization, number of integer division and remainder calculation operations is reduced almost by half.

However, I want to point out here that even with both these optimizations enabled, we still do a number of integer division and remainder calculation operations — proportional to the count of digits of ‘N’.

My previous algorithm

Before moving to the next chapter, I also want to mention my previous story “34% faster Integer to String Conversion algorithm” [3], where I proposed another algorithm called “LR printer”, which doesn’t use remainder calculation at all. Instead, in order to obtain every digit of input number ’N’ it does:

1 integer division,
1 multiplication, and
1 subtraction.

And as the last 2 operations are performed around 4–5 times faster on modern CPUs (compared to remainder calculation or integer division), the result performance was around 25–38% faster for printing 32-bit integers, and around 40–58% faster for 64-bit integers.

In contrast to existing approaches, “LR printer” obtains digits of ’N’ from left to right, directly in the order they should be printed to screen or to a text file.

For obtaining the first digit of ’N’ (assuming ’N’ has ‘L’ digits), it calculates:

digit := N / 10^(L-1),

And for dropping the first digit out of ’N’ it does:

N := N — digit * 10^(L-1).

Examples of obtaining left-most digits of given integers,
and picking them out.

There, all necessary powers of 10 are precalculated and used during program’s entire execution.

The second optimization about getting 2 digits of ’N’ during a single step, is also invokable on “LR printer”.

However, my previous algorithm still relied on integer division, which is a relatively slow operation. So can we eliminate it too, and use only the fastest arithmetics…?

4. The new algorithm for case of binary numeral system (base=2)

In this story, I am going to propose a completely different approach for converting integers from binary to decimal representation. The algorithm described here will exclude both integer division and remainder calculation operations, which are the ones that take more computational time, than all other arithmetical operations. Instead, in order to obtain 1 decimal digit of the input number my algorithm will do 1 multiplication, up to 5 comparisons and up to 5 subtractions.

About binary numeral system

We already mentioned at the beginning of this story, that in computer memory numbers are stored in binary format — as sequences of 0s and 1s. So if we are going to print them right in that way, no transformation is required.

However, as the algorithm that I am going to describe came to me when I was working with binary numbers, and as its behavior in binary numeral system is simpler compared to the decimal system’s case, let me start its description from the binary case. So, I suggest us to temporary forget how numbers are stored in computers internally, and just try to figure out binary digits (0s and 1s) of given integer ’N’, as if doing that on a piece of paper, without computers.

In binary numeral system, each digit has its weight. Weight of the last digit is 1, and weight of every previous digit is twice larger.

Weights of digits for binary positional numeral system.

If a certain digit of the binary number is 1, it means that we, in a sense, “use” that weight, in order to obtain necessary value of ’N’. And if it is 0, then we, in a sense, “skip” that weight.

Example of how a number ‘N=25’ is being constructed in binary positional system.
We just mention from left to right the weights that should sum up to ‘N’.
Those weights are 16, 8 and 1, so we have ‘1’ at corresponding positions.
The other weights are not used so we have ‘0’ in the remaining positions.

Having ‘m’ binary digits, why is it always possible to obtain any number ’n’ from range [0,2^m)? Well, we can illustrate that. In case of ‘m’ binary digits, the first digit will have weight 2^(m-1). So using it (making it 1) will result in 2^(m-1) increase of ’n’, which is actually half of the overall range [0, 2^m).

Considering that we have only 6 binary digits,
so the representable numbers range is [0, 2⁶) = [0, 64), how the value
of the first digit affects further range of numbers that we can construct.
Skipping the first digit (making it 0) leaves us in the first half of total range [0, 2⁶),
while using it (making it 1) takes us to the second half of that range.

So the remaining m-1 digits will be used to find proper point either on the first half of illustrated range, or on the second half, both of which are now 2^(m-1) long.

The second digit has weight 2^(m-2). So regardless in which half we currently are, using it (making it 1) will result in 2^(m-2) increase of ’n’, which is now one quarter of the overall range [0, 2^m).

Considering that we have only m=6 binary digits, how values
of the first 2 digits affects further range of numbers that we can construct.

Continuing with same logic, third digit will always have weight 2^(m-3), and regardless of which quarter we are in, using it (making it 1) will result in 2^(m-3) increase of the number being written, which is 1/8-th of the overall range [0, 2^m). While not using it (making it 0) will keep the written number unchanged, in a sense.

Considering that we have only m=6 binary digits, how values
of the first 3 digits partition the total range [0, 2⁶) into 8 sub-ranges,
where the number ’N’ being constructed can reside.
The light yellow sub-ranges are reachable when not using 3rd digit (making it 0),
while the orange sub ranges are reachable when using it (making it 1).

This process of splitting remaining ranges into halves continues until the last digit. At that point of time, with help of the first ‘m-1’ digits we will be able to produce all even numbers of the overall range [0, 2^m). Weight of the last digit is always 1. So using it (making it 1) will result in increase of ’n’ by 1, thus changing it from even to the next odd value on the numbers line. And not using it (making it 0) will keep the produced even number ’n’ unchanged.

That is how ‘m’ binary digits are able to produce any number from range [0,2^m).

We can observe a complete example of such diagram. Let’s take now 4 binary digits (m=4), which will enable us to write numbers in range [0, 2⁴) = [0, 16).

Diagram for 4 binary digits, allowing to represent numbers [0, 2^m) = [0, 16).
Path for number n=11 is marked with dark green.
Path for number n=6 is marked with light green.

Weights of those 4 digits are [8, 4, 2, 1]. We can see that for example in order to write number 11 we should use 1st digit (+8 to the value), not use 2nd digit, use 3rd one (+2 to the value) and also use the last one (+1 to the value), which will result in binary sequence “1011”. That “path” is highlighted on the diagram with dark green.

Another example, to write number 6, we should not use 1st digit, then use the 2nd one (+4 to the value), use also the 3rd one (+2 to the value), and not use the last one, resulting in “0110” binary sequence, highlighted on the diagram with light green.

One important note for binary numeral system (as well as generally for any positional system) is that sum of weights of the last ‘k’ digits is always one-less than weight of ‘k+1’-th digit from right. This means that putting ‘1’ on all the last ‘k’ digits and ‘0’ on ‘k+1’-th digit (from right) will result in such a value, which will be 1-less than if putting the opposite digits there. For example, taking ‘k=3’:

23 in binary is “10111”, while
24 in binary is “11000”.

By the way, this is totally similar to our regular decimal numeral system. We know that putting ‘0’ on some ‘k+1’-th position from right and putting ‘9’ (the maximal digit) on remaining ‘k’ positions will result in a number that is one less than if putting ‘1’ as ‘k+1’-th digit and ‘0’ (the minimal digit) on the remaining ‘k’ positions. For example, the values:

150'999, and
151'000.

Note, from later on, when describing positions of digits we will always count them from right to left.

The algorithm for case “base=2”

Now, how exactly this interpretation of binary numeral system will help us to derive a faster integer printing algorithm? Quite straightforward: we will print binary digits of given number ’n’ directly from left to right (in contrast to current approaches of printing digits in reverse order), on each step comparing ’n’ with the weight of current digit.

If ’n’ is less than weight of current digit, then we should definitely not use that digit, and write 0 there. Because otherwise, writing 1 will increase the number being written in such a way that it will become larger than ’n’, and later, writing 0 even on the all consequent positions, will not bring us to the necessary value of ‘n’.

Using weight 8 (red arrow), when it was necessary to obtain
number 5 (green arrows). Regardless of what next
weights will be used, number 5 is no more reachable.

In the other case, if ’n’ is greater or equal to weight of current digit, we should definitely use it, by writing 1 at the current position. Because otherwise, if we write 0 there, the number being written will always remain less than ’n’, even if ‘1’ is put later on all the consequent positions. Putting 1 on all positions to right from ‘k’ will give us additional sum ‘2^k-1’, which is one less than of what we could have obtained, in case of putting ‘1’ at the current position.

Skipping weight 8, when it was necessary to obtain number 9 (green arrows).
Using even all subsequent weights 4, 2 and 1 (red arrows)
will still result in less than the necessary value n=9.

After properly printing the current digit, in order to be able to continue the algorithm iteratively, we should decrease ’n’ by the weight of current digit, only if that digit was used (if we wrote 1), and proceed to the next digit. This decrease will reflect the fact that we already “presented” corresponding “fragment” ‘2^k’ of ’n’ which is equal to weight of current digit, so it remains for us to “present” only the remaining “fragments”.

Let’s see an example of complete printing of number ‘n=22’. In binary notation, 22 is represented as “10110”, so that is what we should get at the end. Weights of the last 5 digits in binary numeral system are [16, 8, 4, 2, 1].

Iteration 1:  n = 22
n ≥ 16
printing "1" ; n := n – 16
Iteration 2: n = 6
n < 8
printing "0" ; <no changes to 'n'>
Iteration 3: n = 6
n ≥ 4
printing "1" ; n := n – 4
Iteration 4: n = 2
n ≥ 2
printing "1" ; n := n – 2
Iteration 5: n = 0
n < 1
printing "0" ; <no changes to 'n'>

We can now write the pseudo-code of printing routine for given number ’n’ in binary positional system:

var powers_of_2[0 .. 33] : Array of Integers 
= { 1, 2, 4, 8, 16, 32,
64, 128, 256, 512, ...
... , 1'073'741'824, 2'147'483'648 }
// Precalculated powers of 2, which will be used during print

var result[0 .. 35] : Array of Characters // Assume at most 35 characters

// The procedure takes integer 'N' to be printed, and fills its
// binary digits into the 'result' array.
procedure print_in_binary( N: Integer )
L := calculate_binary_digits_count( N )
i := 0 // Index over the 'result' array
while L > 0
if N >= powers_of_2[ L-1 ]
result[ i ] := '1' // Write digit '1' in the result
N := N – powers_of_2[ L-1 ]
else
result[ i ] := '0' // Write digit '0' in the result
i := i+1 // Prepare to next iteration, and
L := L-1 // ... adjust remaining count of digits of 'N'
result[ i ] := '\0' // Append the terminating 'null' character

You probably noted that both in the example of printing number 22 as “10110”, and in this pseudo-code, we acted like the count of binary digits of the input number ’n’ is already known to us:

  • in the example of printing 22 we started from weight 16, which is 5-th power of 2,
  • in the pseudo-code we called some function ‘calculate_binary_digits_count()’, and remembered obtained count of digits of ’n’ in variable ‘L’.

So here I want to mention that obtaining the count of binary digits of ’n’ is not an issue, if we have precalculated all the powers of 2. Let’s take a look on how the first 5 powers of 2 are written in binary:

1, which is 2⁰ is written as “1”,
2, which is 2¹ is written as “10”,
4, which is 2² is written as “100”,
8, which is 2³ is written as “1000”,
16, which is 2⁴ is written as “10000”,
… and so on.

So we see that the value ‘2^(k-1)’ is the smallest value, which uses ‘k’ binary digits. From there, the value ‘(2^k)-1’ is the largest one which uses ‘k’ binary digits, as its next value ‘2^k’ uses ‘k+1’ digits. This means that just iterating over precalculated powers of 2, until reaching such power which is greater than ’n’, will show us the count of binary digits that ’n’ has.

How the count of binary digits L is being calculated for number N = 12.
We sequentially compare N to powers of 2, until N becomes less.
That happens on power 16 which is 2⁴, so we conclude that L=4.

Hence, turning this into pseudo-code, and writing the function ‘calculate_binary_digits_count()’ will result in the following:

// This function takes integer 'N' and returns count of its binary digits.
function calculate_binary_digits_count( N: Integer ) : Integer;
// Check case of numbers with maximal count of digits
if N >= powers_of_2[ 31 ] // Compare with maximal power of 2
return 32 // The count of digits for such numbers
// Regular case
L := 0
while N >= powers_of_2[ L ]
L := L+1
return L

The initial “if” check here is required for such input numbers ’n’, which are greater or equal to the largest precalculated power of 2 (2³¹ = 2'147'483'648). If skipping that check, an overflow will happen when trying to address the next power of 2 (2³², which is absent from the array). And if ’n’ is not that large, we find the count of its binary digits with the “while” loop, that comes next.

Concluding this chapter, the important fact is that in order to print ’n’, our algorithm does only comparisons and subtractions. When comparing to standard printing algorithms, ours doesn’t use any relatively time-consuming operations such as integer division or remainder calculation. In order to obtain the value of every digit we do only 1 comparison, and possibly 1 subtraction.

5. The new algorithm for case of decimal numeral system (base=10)

Can we apply a similar approach for printing number ’n’ in our familiar, decimal numeral system? If we are going to do so, it will not be enough to iteratively compare ’n’ with some weights, as for every digit we should choose not just between yes or no (1 or 0), but between 10 different options (digit values from ‘0’ to ‘9’).

We know the weights of digits in decimal numeral system. Weight of the last digit is still 1, and weight of every previous digit is 10 times larger:

Weights of digits in decimal positional system.

Can we preserve the logic described in previous chapter (for case base=2) here too, so values of digits will be obtained again only by comparisons and subtractions?

The first idea which might come to us is to do something like linear search among digits [0..9]. So in order to figure out the value of every digit we will repeat the comparison and subtraction operations several times. For example if n=735, and we are figuring out its first digit from left, weight of that digit is 10²=100, so we will repeatedly compare ’n’ with 100 and subtract that weight from ’n’, as long as the latter one remains positive or 0.

Step 1:  n = 735  ;  digit = 0
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 2: n = 635 ; digit = 1
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 3: n = 535 ; digit = 2
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 4: n = 435 ; digit = 3
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 5: n = 335 ; digit = 4
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 6: n = 235 ; digit = 5
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 7: n = 135 ; digit = 6
n ≥ 100
n := n – 100 ; digit := digit + 1
Step 8: n = 35 ; digit = 7
n < 100
<current digit is 7, proceeding to next digit>

The condition ‘n ≥ 100’ was satisfied 7 times, so the value of the first digit from left is ‘7’, and we will continue figuring out value of next digit, having ’n’ decreased to: ‘735–7*100 = 35’.

Then, weight of the next digit is 10, so we will repeatedly compare ’n’ (currently 35) with 10, and subtract 10 from it, as long as it remains positive or 0. Obviously that will be repeated 3 times, altering ’n’ to: ’35–3*10 = 5'.

And value of the last digit is already equal to the current value of ’n’ (as at this point of time ’n’ becomes a 1-digit number).

With this linear search we will do up to 10 comparisons and 9 subtractions when figuring out every digit of ’n’. If we will try to illustrate that logic of digit search, it might graphically result in something like this:

Illustration of how the proper decimal digit is being searched linearly.

Can we do it better? Yes, if we’ll replace linear search with binary search!

Let’s see: when we were figuring out the first digit of n=735, we were iteratively comparing it with weight 100, and were subtracting that weight from ’n’. What if we will cut the digit search range into half? Let’s compare it right with 5*100=500 instead:

Step 1:  n = 735  ;  digit = 0
n ≥ 500
n := n – 500 ; digit := digit + 5

So we know now that the first digit should be in range [5..9]. Let’s cut this range into half too, now comparing ’n’ either with 2*100=200 or with 3*100=300:

Step 2:  n = 235  ;  digit = 5
n < 300
<we don't make any alterations then>

As current ’n’ turns out to be less than 3*100, the range search cuts down from [5..9] to [5..7]. Now lets compare ’n’ with 100:

Step 3:  n = 235  ;  digit = 5
n ≥ 100
n := n – 100 ; digit := digit + 1

Now ’n’ appeared to be greater than 100, which cuts search range of the first digit from [5..7] to [6..7]. And so there is need to compare ’n’ with 100 one more time:

Step 4:  n = 135  ;  digit = 6
n ≥ 100
n := n – 100 ; digit := digit + 1

Which results in:

    n = 35  ;  digit = 7

So we have properly calculated the first digit of n=735 as ‘7’, and reduced ’n’ from 3-digit value 735 into 2-digit value 35, which allows us to perform similar operations for finding the next digit too.

While searching the next digit, we will again start with possible range of digits [0..9] and will repeatedly cut it into 2 (almost) equal parts, and continuing in the necessary half. The only difference is that comparisons and subtractions will be made now with the factor of 10¹=10, and not 10²=100.

Step 1:  n = 35  ;  digit = 0
n < 50
<we don't make any alterations then>

Search range [0..9] is now cut into [0..4].

Step 2:  n = 35  ;  digit = 0
n ≥ 30
n := n – 30 ; digit := digit + 3

Search range [0..4] is cut into [3..4].

Step 3:  n = 5  ;  digit = 3
n < 10
<we don't make any alterations then>

So the proper value of the second digit is found to be ‘3’, and ’n’ is reduced from 2-digit number 35 into 1-digit number 5. Which is itself the last digit of original n=735, and can be printed just as it is.

If we try to graphically illustrate the diagram of searching proper digit in decimal positional system, we might come up with something like this:

Diagram of searching proper decimal digit, acting as in Binary search.
The digit portions [5, 3, 2, 1] are represented from top to bottom, as jumps of different length.
Combining various digit portions, we are able to sum up any digit.

So we see that for every digit, we can iterate by amounts [5, 3, 2, and 1], in order to obtain value of that digit. I prefer to call those amounts “digit portions”, as their different subsets are capable of summing-up the values of all decimal digits. And while considering every digit portion, we do 1 multiplication, 1 comparison, and possibly 1 subtraction.

Having this said, we can write the pseudo-code for described algorithm:

var powers_of_10[0 .. 10] : Array of Integers 
= { 1, 10, 100, 1'000, ..., 100'000'000, 1'000'000'000 }
// Precalculated powers of 10, which will be used during print

var result[0 .. 25] : Array of Characters // Assume at most 25 characters

// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into the 'result' array.
procedure print( N: Integer )
L := calculate_digits_count( N )
i := 0 // Index over the 'result' array
while L > 0
digit := 0 // Start summing up current digit from 0
// Iterate over digit portions with 'p' variable
for p in [5, 3, 2, 1]
term := p * powers_of_10[ L-1 ]
if N >= term // we should use digit portion 'p'
digit := digit + p // Increase current digit
N := N – term // ... and reduce 'N' accordingly
result[ i ] := '0' + digit // Write the digit into 'result' array
i := i+1 // Prepare to next iteration, and
L := L-1 // ... adjust remaining count of digits of 'N'
result[ i ] := '\0' // Append the terminating 'null' character

As my algorithm is based on building up digits values with help of digit portions, I prefer to call it “Digit Portions printer”, or shortly “DP printer”.

Note, here we also rely on some function “calculate_digits_count()”, which returns the count of decimal digits of given number ’n’, so we can instantly start comparing ’n’ with proper power of 10. And similarly to the case of printing in binary notation, here also we can use the precalculated array of powers of 10, to quickly find out count of decimal digits of ‘n’.

How the count of digits L is being calculated for number N = 47'024.
We sequentially compare N to powers of 10, until N becomes less.
That happens on power 100'000 which is 10⁵, so we conclude that L=5.

In the pseudo-code, only the precalculated array’s name and its largest index should be altered:

// This function takes integer 'N' and returns count of its decimal digits.
function calculate_digits_count( N: Integer ) : Integer;
// Check case of numbers with maximal count of digits
if N >= powers_of_10[ 9 ] // Compare with maximal power of 10
return 10 // The count of digits for such numbers
// Regular case
L := 0
while N >= powers_of_10[ L ]
L := L+1
return L

The presented 2 fragments of code together give us the complete algorithm for printing integers in decimal numeral system. The algorithm doesn’t make use of integer division or remainder calculation operations.

Let me note, that in contrast to printing in binary numeral system, here, if the number ’n’ being printed has the maximal count of decimal digits, some overflow issues might arise. For example, if working with 32-bit unsigned integers, which can represent range of values [0, 4'294'967'296), and if the number being printed is “1'500'000'000” (also having 10 decimal digits), then we must be careful not to compare it with “5 * 1'000'000'000 = 5'000'000'000”, as the last one just doesn’t fit in 32 bits. So there, the digit portion “5” must be skipped, only digit portions [3, 2, 1] should be considered for getting the first digit of ‘n’. After that, all the next digits should be obtained in the described way.

So adding one more condition to the pseudo-code, like “if n >= 1'000'000'000”, will be enough. I just decided to skip it, for compactness of the presented pseudo-codes.

6. The algorithm for other positional numeral systems

The presented algorithm “DP printer” can be easily adjusted for printing numbers in any other positional numeral system. The only difference will be that the sequence of digit portions [5, 3, 2, 1] is replaced by some other sequence of digit portions, and that when performing comparisons and subtractions, we should multiply every digit portion not by some power of 10, but some power of the new base.

For example, if we would like to print numbers in base=14, then possible sequence of digits portions might be like [7, 4, 2, 1].

Generally, when constructing digits portions for given numbers base, we should always try to cut the digits range into 2 equal (or almost equal) parts. That’s why for base=14 the preferred way is to first cut range [0..14) by digit portion ‘7’ into two sub-ranges [0..7) and [7..14), then cut each sub-range by digit portion ‘4’ into sub-ranges [0..4), [4..7), [7..11) and [11..14), and so on, until reaching sub-ranges of unit length.

Diagram of searching proper decimal digit, when we are
in base=14, acting similarly to Binary search algorithm.
The digit portions [7, 4, 2, 1] are represented from top to bottom,
as jumps of different length. Combining various digit portions,
we are able to sum-up any value (digit) in range [0..13].

If writing pseudo-code for printing routine in the new base 14, the only differences would be:

  • replacing digit portions sequence [5, 3, 2, 1] by sequence [7, 4, 2, 1], corresponding to the base 14, and
  • replacing precalculated powers of 10 by precalculated powers of 14.

7. Necessary optimizations

We have presented the basic version of the new integer printing algorithm “DP printer”. However, if we run it along with the standard conversion algorithm and compare their performances, we will see that the standard one works faster. That’s why in this chapter I will present several sequential optimizations, which can (and should) be applied to the basic version of “DP printer”, in order to make it to outperform all existing solutions.

Optimization A: loop unrolling

We can easily note that if we are in a certain numbers base (right now, in base=10), digit portions become predefined (like [5, 3, 2, 1]), so we have the option of unrolling the inner loop, thus reaching some speed-up:

var powers_of_10[0 .. 10] : Array of Integers 
= { 1, 10, 100, 1'000, ..., 100'000'000, 1'000'000'000 }
// Precalculated powers of 10, which will be used during print

var result[0 .. 25] : Array of Characters // Assume at most 25 characters

// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into the 'result' array.
procedure print( N: Integer )
L := calculate_digits_count( N )
i := 0 // Index over the 'result' array
while L > 0
digit := 0 // Start summing-up current digit from 0
// Go over digit portions:
// digit portion '5'
term := 5 * powers_of_10[ L-1 ]
if N >= term // we should use digit portion '5'
digit := digit + 5 // Increase current digit
N := N – term // ... and reduce 'N' accordingly
// digit portion '3'
term := 3 * powers_of_10[ L-1 ]
if N >= term // we should use digit portion '3'
digit := digit + 3 // Increase current digit
N := N – term // ... and reduce 'N' accordingly
// digit portion '2'
term := 2 * powers_of_10[ L-1 ]
if N >= term
digit := digit + 2
N := N – term
// digit portion '1'
term := powers_of_10[ L-1 ]
if N >= term
digit := digit + 1
N := N – term
// Write produced digit into 'result' array
result[ i ] := '0' + digit
i := i+1 // Prepare to next iteration, and
L := L-1 // ... adjust remaining count of digits of 'N'
result[ i ] := '\0' // Append the terminating 'null' character

Unrolling the inner loop in this way will take out 4 checks and 4 index increments of the “for” loop.

Experimental comparisons presented later in chapter 8, will show that such unrolling has a significant positive impact on the performance.

Optimization B: reducing number of multiplications

According to “DP printer”, in order to obtain value of every digit we multiply digit portions [5, 3, 2, 1] by the current power of 10. So for example, for obtaining the digit of “thousands” (4'th from right), we are doing the following multiplications:

5*1'000
3*1'000
2*1'000
1*1'000 (this is actually skipped, as is multiplied by 1)

after which obtained values are being compared and possibly subtracted from ‘n’.

We were doing that because our purpose was to cover range of possible digits [0..9] with some decreasing sequence of digit portions. But what if we violate the logic of binary search a bit, and come up with a slightly different sequence of digit portions? Such a sequence, which will reduce overall number of multiplications that we should do.

For example, if instead of sequence of digit portions [5, 3, 2, 1] we take another sequence [3,3,1,1,1], we will still cover the required range of digits [0..9]:

The altered sequence of digit portions [3,3,1,1,1].
The first 2 portions are illustrated as large jumps of length 3,
while the remaining 3 portions are represented with small jumps of length 1.
Here also, by combining various portions, we are able to sum up any digit
in necessary range [0..9].

For computing corresponding terms with this new sequence of digits portions, we will do less multiplications:

3*1'000
3*1'000 (this is repeating, so is skipped)
1*1'000 (this is actually skipped, as multiplied by 1)
1*1'000 (as well as the remaining 2 repeating terms)
1*1'000 (…)

So we see that sequence of digits portions [3,3,1,1,1] allows us to make only one multiplication per digit, compared to three multiplications per digit of the previous digit portions [5,3,2,1].

After this optimization, pseudo-code becomes like this:

var powers_of_10[0 .. 10] : Array of Integers 
= { 1, 10, 100, 1'000, ..., 100'000'000, 1'000'000'000 }
// Precalculated powers of 10, which will be used during print

var result[0 .. 25] : Array of Characters // Assume at most 25 characters

// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into the 'result' array.
procedure print( N: Integer )
L := calculate_digits_count( N )
i := 0 // Index over the 'result' array
while L > 0
digit := 0 // Start summing-up current digit from 0
// Go over digit portions:
// digit portion '3'
term := 3 * powers_of_10[ L-1 ]
if N >= term // we should use digit portion '3'
digit := digit + 3 // Increase current digit
N := N – term // ... and reduce 'N' accordingly
// one more digit portion '3', ('term' remains same)
if N >= term
digit := digit + 3
N := N – term
// digit portion '1'
term := powers_of_10[ L-1 ]
if N >= term // we should use digit portion '1'
digit := digit + 1 // Increase current digit
N := N – term // ... and reduce 'N' accordingly
// one more digit portion '1' ('term' remains same)
if N >= term
digit := digit + 1
N := N – term
// one more digit portion '1' ('term' remains same)
if N >= term
digit := digit + 1
N := N – term
// Write produced digit into 'result' array
result[ i ] := '0' + digit
i := i+1 // Prepare to next iteration, and
L := L-1 // ... adjust remaining count of digits of 'N'
result[ i ] := '\0' // Append the terminating 'null' character

You might argue that the count of digit portions has increased now from 4 to 5, and so it might slow down the algorithm. Yes, it could be the case, but this change is required for the next optimization, which will justify the current one.

Optimization C: skipping repeated portions, which didn’t work before

Let’s assume that we decided to print input number ’n’ by digit portions [3,3,1,1,1]. Also let’s assume that we are currently on “thousand”-s digit (4'th from right). Following that logic we will first try to reduce ’n’ by 3'000. If that works we will try to reduce ’n’ by 3'000 once again. After that, we will 3 times compare the remaining ’n’ with 1'000, and similarly try to reduce it by 1'000.

But if at the first step ’n’ was less than 3'000, then no subtraction will be made, and ’n’ will remain unchanged. And despite that, in the next step we will compare it with 3'000 again, which is pointless. If digit portion is repeated, it makes sense to analyze it only if the previous comparison worked, in other words, if ’n’ was reduced by it.

Having this noted, we can nest our conditions, which will result in the following pseudo-code:

var powers_of_10[0 .. 10] : Array of Integers 
= { 1, 10, 100, 1'000, ..., 100'000'000, 1'000'000'000 }
// Precalculated powers of 10, which will be used during print

var result[0 .. 25] : Array of Characters // Assume at most 25 characters

// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into the 'result' array.
procedure print( N: Integer )
L := calculate_digits_count( N )
i := 0 // Index over the 'result' array
while L > 0
digit := 0 // Start summing-up current digit from 0
// Go over digit portions:
// digit portion '3'
term := 3 * powers_of_10[ L-1 ]
if N >= term // we should use digit portion '3'
digit := digit + 3 // Increase current digit
N := N – term // ... and reduce 'N' accordingly
// one more digit portion '3' ('term' remains same)
if N >= term
digit := digit + 3
N := N – term
// digit portion '1'
term := powers_of_10[ L-1 ]
if N >= term // we should use digit portion '1'
digit := digit + 1 // Increase current digit
N := N – term // ... and reduce 'N' accordingly
// one more digit portion '1' ('term' remains same)
if N >= term
digit := digit + 1
N := N – term
// one more digit portion '1'
if N >= term
digit := digit + 1
N := N – term
// Write produced digit into 'result' array
result[ i ] := '0' + digit
i := i+1 // Prepare to next iteration, and
L := L-1 // ... adjust remaining count of digits of 'N'
result[ i ] := '\0'

Obviously, this will have its speed up when calculating the value of current digit. For example, if current digit is actually ‘1’, only the 1st, 3rd, and 4th digit portions will be analyzed ([3,_,1,1,_]). Or if current digit is ‘6’, only 1st, 2nd and 3rd digits portions will be analyzed ([3,3,1,_,_]). While surely, for some other digits there will be no performance gain. For example if current digit is ‘9’, the algorithm will still do all the comparisons ([3,3,1,1,1]), as the value “9” should be “decremented” more times, than values “1” or “6”.

Optimization D: moving from base=10 to base=100

Within the previous 3 optimizations A, B, and C, we cover range of possible digits [0..9] by some sequence of digit portions, which are later being used to obtain value of every digit of ’n’. But what if instead we will try to cover range of all possible 2 adjacent digits [00..99] by another sequence of digit portions (this time, more proper will be to say digits-pair portions), and later use them to obtain values of 2 adjacent digits of ’n’ during a single step? Surely we have no guarantee that it will work faster, but nothing prevents us from trying.

Within my experiments I tried the following two sequences of digits-pair portions:

a) [33,33,11,11,4,4,1,1,1]
b) [20,20,20,20,4,4,4,4,1,1,1]

Note, they both cover the range [00..99], which corresponds to two adjacent digits.

Sequence of digits-pair portions [33,33,11,11,4,4,1,1,1].
The first 2 portions are illustrated as large jumps of length 33.
The next 2 portions are illustrated as smaller jumps of length 11.
Next 2 portions are represented with smallest arrows of length 4.
The last portions of “1” are not illustrated.

Within this optimization we will still apply similar comparisons and subtractions on the input number ’n’, but now, after every iteration we will obtain values of 2 adjacent digits, having summed up in another variable “digits_pair”, which will result in something between [0..99]. Then, in order to be able to efficiently print those 2 digits, we should also have prepared an array of length 100, which will map indexes from 0 to 99, to corresponding pairs of characters (from “00”, “01”, “02”, … to … “98”, “99”).

Important notes: here we still have repeated portions, which allows us to use also optimizations B and C. And as it was for the base version of “DP printer”, as the count of digits of ’n’ might be the maximal, here also we must be careful to choose only such digits-pair portions which will not cause overflows, when multiplied by necessary powers of 100.

Experimental results, presented in the next chapter, show that this optimization actually brings its own speed-up.

8. Experimental results

Of course, a suggestion of any optimization should be backed up by experimental comparison. In this chapter, I want to present comparisons of runtimes of the following integer printing algorithms:

  • Standard printer with first optimization [Std],
  • Standard printer with first & second optimizations as [Std 2-dig],
  • LR printer [LR],
  • LR printer with second optimization as [LR 2-dig],
  • DP printer [DP] (chapter 5),
  • DP printer with optimization A as [DP A] (loop unrolling, chapter 7),
  • DP printer with optimizations A and B as [DP AB] (repeated digit portions),
  • DP printer with optimizations A, B and C as [DP ABC] (nesting conditions),
  • DP printers with all 4 described optimizations (chapter 7):
    … with digits-pair portions [33,33,11,11,4,4,1,1,1] as [DP A-D 33],
    … with digits-pair portions [20,20,20,20,4,4,4,4,1,1,1] as [DP A-D 20].

All the listed algorithms were printing numbers in base=10, however we can easily change that for any other base, if required.

The experiments were performed separately, on the following types of integers:

  • 32-bit integers, 5 decimal digits,
  • 32-bit integers, 8 decimal digits,
  • 64-bit integers, 8 decimal digits,
  • 64-bit integers, 18 decimal digits.

As the printing algorithm currently used in standard libraries of most programming languages is the “Standard algorithm” with the first optimization enabled (chapter 3), here I will present performance gain of every other algorithm in relation to the former.

Comparison results for 32-bit integers

Comparing running times when converting 32-bit integers to strings gives the following results:

Average time (in nanoseconds) required to print one 32-bit integer,
having either 5 or 8 decimal digits, by every of the listed algorithms.

We can see that the basic version of DP printer [DP] (without optimizations) performs by around 23% slower than current standard algorithm [Std].

However, once we enable optimization A (loop unrolling) [DP A] it starts to behave around 29–32% faster.

Turning on the optimization B (introducing repeated digit portions) [DP AB] drops down the performance gain, making it only about 18–22% faster. Probably that is because we are taking now 5 digit portions, instead of 4.

But enabling the optimization C too (eliminating checks for repeated portions) [DP ABC] brings even more performance gain: making it around 43–45% faster than the standard routine.

When enabling all 4 optimizations we have 2 cases: [DP A-D 33], and [DP A-D 20]. Exact digits-pair portions are specified at the beginning of this chapter. For printer [DP A-D 33] we have 34–49% performance gain compared to [Std], and for [DP A-D 20] is makes around 47–59% faster than currently standard printing algorithm.

The title of this story was chosen based on results of comparing printer [DP ABC] with the currently standard printing algorithm [Std], for 32-bit integers case. This has 2 reasons:

  • [DP ABC] is the most optimized variant which still doesn’t use the 2-digits table. I think, as the currently standard algorithm also doesn’t use any tables, it will be fair to compare it with the one which doesn’t require them either.
  • Among integers, 32-bit integer is the most commonly used data type.

So taking the “32-bit / 8 digits” case, and dividing the runtime of [Std] over the one of [DP ABC] (from the table above), we get:

19.77 / 10.76 ≈ 1.84

Comparison results for 64-bit integers

Comparison results of printing 64-bit integers are even more impressive:

Average time (in nanoseconds) required to print one 64-bit integer,
having either 8 or 18 decimal digits, by every of the listed algorithms.

Here we can see that the basic version of DP printer [DP] now doesn’t drop performance, compared to the standard algorithm [Std], as it was for the 32-bit case. Instead it brings around 26% performance gain.

Enabling optimization A (unrolling the inner loop) [DP A] makes it 50–52% faster.

Similarly to the 32-bit integers case, here also turning on optimization B (introducing repeated digit portions) [DP AB] drops down performance gain, making it only 40–44% faster from [Std].

But enabling optimization C too (eliminating checks for repeated portions) [DP ABC] makes the algorithm 57–62% faster.

Enabling all 4 optimizations for the 64-bit integers case [DP A-D 33] and [DP A-D 20] makes the algorithm behave around 68–72% faster, compared to the currently standard printing algorithm [Std].

Note, if for the 64-bit integers case we would perform the same comparison of efficiencies between [DP ABC] and [Std] (the printers which don’t use internal tables), as we did for the 32-bit integers a bit above, we would get:

72 / 27.7 ≈ 2.6

which is significantly higher than the 1.8 ratio of 32-bit case.

9. Conclusion

We have presented a new algorithm “DP printer” for converting integers to strings. It does not rely on the relatively time-consuming arithmetical operations, such as integer division or remainder calculation. Instead, “DP printer” uses only multiplication, comparison, and subtraction.

The algorithm can be significantly optimized if the base in which numbers should be printed is known at compile time. That will allow to unroll the inner loop right in the code, and to make the subsequent optimizations B, C and D, all as described in chapter 7. Note, this requirement is almost always satisfied, as numbers are generally printed in base=10. And if not, the base is still often known at compile-time, like when printing numbers in octal (base=8) or hexadecimal (base=16) numeral systems.

If number base is not known at the compile time, the algorithm still works without doing any time-consuming operation, but its performance gain is either small (as for 64-bit case) or absent (for 32-bit case).

Experiments show that the optimized version of my algorithm performs around 43–45% faster for printing 32-bit integers, and around 57–62% faster for 64-bit integers. Performance gain of “DP printer” compared to the standard printing algorithm is expected to be significantly higher when working with big integers (objects capable of storing unlimited count of digits, thus representing values greater than 64-bit integers).

Converting integers to their string representation is never a bottleneck in most computer programs, as generally, only tiny percent of calculated integers is being sent for printing. However, if the program generates large text files (like *.json or *.csv), then the count of printed integers can be quite large, and the performance of integer to string conversion algorithm starts playing role.

That’s all for now! Thank you for reading my story till the end. I will be happy to read any of your thoughts and comments below!

Great thanks to David Ayrapetyan for precise and careful review of the draft of this story! ( https://www.linkedin.com/in/davidayrapetyan/ )

All illustrations are made by Asya Papyan ( https://www.behance.net/asyapapyan ).

If you enjoyed reading, you can find and connect to me on LinkedIn ( https://www.linkedin.com/in/tigran-hayrapetyan-cs/ )

References:

[1] : “Integer to string conversion” — https://tia.mat.br/posts/2014/06/23/integer_to_string_conversion.html

[2] : “Three optimization tips for C++” — https://www.facebook.com/notes/10158791579037200/

[3] : “34% Faster Integer to String Conversion Algorithm” — https://medium.com/towards-data-science/34-faster-integer-to-string-conversion-algorithm-c72453d25352

--

--