Lua beautifier in 55 lines of Perl

I have been working on a Lua IDE (ZeroBrane Studio) and wanted to re-format a large number of Lua files to satisfy my style preferences. I did find a nice Lua beautifier (also written in Lua), but it didn't handle many of the cases that were present in the code I was working with and I didn't want to make any manual changes. I ended up using the same approach, but completely rewrote it in Perl as its regular expressions are a tad more powerful. I'm sure that all this can be re-implemented in Lua, but I didn't have time. Lua regular expressions are simple and quite flexible, but one of the simple features I miss is the ability to add modifiers to groups. For example, I can write %d+ to indicate 1 or more digits (same as \d+ in Perl), but I can't write (local )? to indicate an optional group.

Here is the code I ended up with:

use strict;
use warnings;

use constant INDENT => '  ';
my($currIndent, $nextIndent, $prevLength) = (0, 0, 0);

while (<>) {
  s/^\s+|\s+$//g; # remote all spaces on both ends
  s/\s+/ /g; # replace all whitespaces inside the string with one space

  my $orig = $_;

  s/(['"])[^\1]*?\1//g; # remove all quoted fragments for proper bracket processing
  s/\s*--.+//; # remove all comments; this ignores long bracket style comments

  # open a level; increase next indentation; don't change current one
  if (/^((local )?function|repeat|while)\b/ && !/\bend\s*[\),;]*$/
   || /\b(then|do)$/ && !/^elseif\b/     # only open on 'then' if there is no 'elseif'
   || /^if\b/ && /\bthen\b/ && !/\bend$/ # only open on 'if' if there is no 'end' at the end
   || /\bfunction\s*\([^\)]*\)$/) {
    $nextIndent = $currIndent + 1;
  # close the level; change both current and next indentation
  elsif (/^until\b/ 
      || /^end\s*[\),;]*$/
      || /^end\s*\)\s*\.\./ # this is a special case of 'end).."some string"'
      || /^else(if)?\b/ && /\bend$/) {
    $nextIndent = $currIndent = $currIndent - 1;
  # keep the level; decrease the current indentation; keep the next one
  elsif (/^else\b/ 
      || /^elseif\b/) {
    ($nextIndent, $currIndent) = ($currIndent, $currIndent-1);

  my $brackets = y/(// - y/)//; # capture unbalanced brackets
  my $curly = y/{// - y/}//; # capture unbalanced curly brackets

  # close (curly) brackets if needed
  $currIndent += $curly if $curly < 0 && /^\}/; 
  $currIndent += $brackets if $brackets < 0 && /^\)/; 

  warn "WARNING: negative indentation at line $.: $orig\n" if $currIndent < 0;

  print((length($orig) ? (INDENT x $currIndent) : ''), $orig, "\n")
    if $prevLength > 0 || length($orig) > 0; # this is to collapse empty lines

  $nextIndent += $brackets + $curly;

  $currIndent = $nextIndent;
  $prevLength = length($orig);

warn "WARNING: positive indentation at the end\n" if $nextIndent > 0;

You can update INDENT to match your style; change it to "\t" if you prefer tabs rather than spaces.

The code is not complex, but it handles indentation for opened brackets and curly brackets as well as for anonymous function definitions. For example, it generates this fragment that correctly handles quoted brackets:

  local match = { [string.byte("<")] = true,
    [string.byte(">")] = true,
    [string.byte("(")] = true,
    [string.byte(")")] = true,
    [string.byte("{")] = true,
    [string.byte("}")] = true,
    [string.byte("[")] = true,
    [string.byte("]")] = true,

and this fragment that includes an anonymous function:

    function (event)
      local pos = editor:GetCurrentPos()
      local start_pos = editor:WordStartPosition(pos, true)
      editor:SetSelection(start_pos, pos)

The script will also detect and report if something goes wrong and the indentation becomes negative at some point or is greater than zero at the end. Note that the script doesn't re-arrange the lines in any way and only re-formats them, so you can easily check that the number of lines is the same before and after the processing.

It does not handle multi-line comments and strings (of [[ ]] and --[[ ]] style) and any non-matching text inside those can easily throw it off track. It also doesn't detect logical operators that are often used in Lua. For example,

foo = bar and 1
  or 2

will be formatted as:

foo = bar and 1
or 2

This can be fixed by adding brackets:

foo = (bar and 1
  or 2)
You should get a copy of my slick ZeroBrane Studio IDE and follow me on twitter here.


How do you make read from a file and not STDIN?

You can simply redirect STDIN (and STDOUT) when you run the script. Instead of running it as "perl", you can run it as "perl <from >to".

You can also open the file in the script itself. Just add "open(F, "<from") or die "Can't open file 'from'\n"" and replace "while (<>)" with "while (<F>)" and it will read from the file instead of STDIN.

Sorry about the alias. It's not meant to irk you even though there's some truth it. I'm just venting my frustrations with those unhelpfull error messages. Works now.

Leave a comment

what will you say?