# Alphanum sorting for humans in Lua

We have been discussing options for my Serpent serializer on the Lua maillist and David Manura suggested natural sort order as one of possible choices for sorting table keys. I liked the idea and came up with my own version; you may find other implementation in various languages on Dave Koelle's page.

Most of the implementations of this alphanumeric sort first split the values being sorted into chunks of numbers and non-numbers and then sort them going over those chunks and comparing them pairwise. After reviewing options for split in Lua I decided to look for alternatives and came up with an algorithm that doesn't require splitting strings or storing intermediate chunks. I have update the post to offer you four versions so that you can pick the one that matches your case (see the updates at the bottom of the post).

1. The simplest one. Doesn't correctly sort numbers with decimal points ("0.11" comes after "0.2") and doesn't ignore leading zeroes ("60" comes before "050"), but should still cover the majority of cases:

``````function alphanumsort(o)
local function padnum(d) return ("%03d%s"):format(#d, d) end
table.sort(o, function(a,b)
return o
end``````

2. Ignore leading zeros. This one sorts "060" and "60" together (so "050" will come before "60"), but the order in which "060" and "60" are sorted is undetermined:

``````function alphanumsort(o)
local function padnum(d) local r = string.match(d, "0*(.+)")
return ("%03d%s"):format(#r, r) end
table.sort(o, function(a,b)
return o
end``````

3. Sort numbers with decimal points. This one puts "0.2" after "0.11":

``````function alphanumsort(o)
local function padnum(d) local dec, n = string.match(d, "(%.?)0*(.+)")
return #dec > 0 and ("%.12f"):format(d) or ("%s%03d%s"):format(dec, #n, n) end
table.sort(o, function(a,b)
return o
end``````

4. This one partially fixes leading zeros, such that "0050" is put before "050", which is in turn before "50"; if you want "050" to be after "50" then reverse #a and #b in the last comparison:

``````function alphanumsort(o)
local function padnum(d) local dec, n = string.match(d, "(%.?)0*(.+)")
return #dec > 0 and ("%.12f"):format(d) or ("%s%03d%s"):format(dec, #n, n) end
table.sort(o, function(a,b)
return o
end``````

This is how you would use it:

``````local o = {"1000X Rad Max","10X Rad","200X Rad","20X Rad","20X Rad Prime",
"Abc 50.02 Ult","Abc 50.01 Ult","Abc 50.1 Ult","Abc 060 Ult",
"Abc 0060 Ult", "Abc 050 Ult","Abc 0050 Ult","Abc 51 Ult","Abc 50 Ult",
"Abc 500 Ult","Abc 51 Ult", "Abc 51B Ult","Abc 52 Ult","Abc 60 Ult",
"Alpha 100","Alpha 2","Alpha 200","Alpha 2A","Alpha 2A-8000","Alpha 2A-900"}
for k,v in ipairs(alphanumsort(o)) do print(v) end``````

And these are the results that would be generated for each of the options:

``````#0. Default      #1. Simplest     #2. Leading 0    #3. Decimals     #4. Best
-------------    -------------    -------------    -------------    -------------
Abc 0.11         Abc 0.2          Abc 0.2          Abc 0.11         Abc 0.11
Abc 0.2          Abc 0.11         Abc 0.11         Abc 0.2          Abc 0.2
Abc 0050 Ult     Abc 12a          Abc 12a          Abc 12a          Abc 12a
Abc 0060 Ult     Abc 50 Ult       Abc 012b         Abc 012b         Abc 012b
Abc 012b         Abc 50.1 Ult     Abc 050 Ult      Abc 050 Ult      Abc 0050 Ult
Abc 050 Ult      Abc 50.01 Ult    Abc 0050 Ult     Abc 0050 Ult     Abc 050 Ult
Abc 060 Ult      Abc 50.02 Ult    Abc 50 Ult       Abc 50 Ult       Abc 50 Ult
Abc 12a          Abc 51 Ult       Abc 50.1 Ult     Abc 50.01 Ult    Abc 50.01 Ult
Abc 50 Ult       Abc 51 Ult       Abc 50.01 Ult    Abc 50.02 Ult    Abc 50.02 Ult
Abc 50.01 Ult    Abc 51B Ult      Abc 50.02 Ult    Abc 50.1 Ult     Abc 50.1 Ult
Abc 50.02 Ult    Abc 52 Ult       Abc 51 Ult       Abc 51 Ult       Abc 51 Ult
Abc 50.1 Ult     Abc 60 Ult       Abc 51 Ult       Abc 51 Ult       Abc 51 Ult
Abc 500 Ult      Abc 012b         Abc 51B Ult      Abc 51B Ult      Abc 51B Ult
Abc 51 Ult       Abc 050 Ult      Abc 52 Ult       Abc 52 Ult       Abc 52 Ult
Abc 51 Ult       Abc 060 Ult      Abc 60 Ult       Abc 60 Ult       Abc 0060 Ult
Abc 51B Ult      Abc 500 Ult      Abc 0060 Ult     Abc 0060 Ult     Abc 060 Ult
Abc 52 Ult       Abc 0050 Ult     Abc 060 Ult      Abc 060 Ult      Abc 60 Ult
Abc 60 Ult       Abc 0060 Ult     Abc 500 Ult      Abc 500 Ult      Abc 500 Ult
Alpha 100        Alpha 2          Alpha 2          Alpha 2          Alpha 2
Alpha 2          Alpha 2A         Alpha 2A         Alpha 2A         Alpha 2A
Alpha 200        Alpha 2A-900     Alpha 2A-900     Alpha 2A-900     Alpha 2A-900
Alpha 2A         Alpha 2A-8000    Alpha 2A-8000    Alpha 2A-8000    Alpha 2A-8000
Alpha 2A-8000    Alpha 100        Alpha 100        Alpha 100        Alpha 100
Alpha 2A-900     Alpha 200        Alpha 200        Alpha 200        Alpha 200``````

[Update 8/25/2016] 5. While the previous function already does what's needed in most of the cases, there is one special case where it fails to sort in a way that humans would sort it: when a number in one string is in a position where a number is absent in some other string. For example: when sorting {"dir\\file.lua", "dir2\\file.lua", "dir3\\file.lua", "dir3\\file2.lua"} using the algorithm above, you'd get {"dir2\\file.lua", "dir3\\file.lua", "dir3\\file2.lua", "dir\\file.lua"}, which is probably not what you'd expect.

Egor Skriptunoff came up with a clever fix for this problem:

``````function alphanumsort(o)
local function conv(s)
local res, dot = "", ""
for n, m, c in tostring(s):gmatch"(0*(%d*))(.?)" do
if n == "" then
dot, c = "", dot..c
else
res = res..(dot == "" and ("%03d%s"):format(#m, m)
or "."..n)
dot, c = c:match"(%.?)(.*)"
end
res = res..c:gsub(".", "\0%0")
end
return res
end
table.sort(o,
function (a, b)
local ca, cb = conv(a), conv(b)
return ca < cb or ca == cb and a < b
end)
return o
end``````

If you sort the same table using this function, you'll get the sorted table you'd likely expect: {"dir\\file.lua", "dir2\\file.lua", "dir3\\file.lua", "dir3\\file2.lua"}.

One caveat for those using UTF-8 strings: the converted text may not be a proper UTF-8 text after the conversion. The fix is to only prepend zeros to the starting bytes of UTF-8 symbols:

``````res = res..c:gsub("[\0-\127\192-\255]", "\0%0")
...
return utf8.lt(ca, cb) or ca == cb and utf8.lt(a, b)``````

[Update 8/25/2012] I rewrote the solution based on the feedback and discussion on the maillist. These notes no longer apply, as they were for an earlier version that suffered from not handling numbers larger than maxint on your platform. Here is the original code:

``````function alphanumsort(o)
local function padnum(d) return ("%012d"):format(d) end
table.sort(o, function(a,b)
return o
end``````

If you are concerned that the hard-coded length (12) may be too small for your numeric fragments, you can replace it with something a bit more elaborate:

``````  local maxl = 0
for n,v in ipairs(o) do tostring(v)
:gsub("%d+", function(d) if #d > maxl then maxl = #d end; return d end) end
local function padnum(d) return ("%0"..maxl.."d"):format(d) end``````

Keep in mind that this may fail badly if your text happened to have a long sequence of numbers as some libraries may not be able to take an arbitrarily large length in number formatting. For example, on Windows `("%099d"):format(1)` works, but `("%100d"):format(1)` fails with `invalid format (width or precision too long)` message.

You should get a copy of my slick ZeroBrane Studio IDE and follow me on twitter here.

what will you say?
(required)
(required)