Friday, August 25, 2006

 

Final Notes

I;ve been asked to make a final post summarizing my progress. I already covered some of this in my last post, but here I will run down everything in the Pugs porting HowTo that I've completed. Items quoted from the HowTo will be italicized.

- $array[idx] -> @array[idx]
Done. This was on of the first translations I did, it was a simple matetr of finding any references to a single item of an array and changing the appororiate sigil. It was slightly harder to handle this inside text, since the text parser can only decide if a variable is a scalar, array, or hash at the end, but with some work I put together a parser that can handle single array or hash elements within text.

- $hash{$key} -> %hash{$key}
Done. This was actually a lot like the above translation, just on hashes.

- $hash{word} -> %hash
Done. This was simple, just a matter of finding and constnant keys to hashes and changing the opener and closer.

- "@array" -> "@array[]"
- "%hash" -> "%hash{}"
Done. These required a slightly smart parser for text. At the same time single array or hash elements are processed, entire hash or array references are handeled by just sticking on the appropraote {} or [].

- scalar @foo -> +@foo # or @foo.elems
Done. Currently this only translates to +@array, it would be good to add a .elems option for when the translator is doing heavy object orientation.

- a ? b : c -> a ?? b !! c
Done. Changed operators are easy to handle, I just look for any operators, and if it's a ? or : I translate it.

- $x =~ s/.../.../g -> $x ~~ s:P5:g/.../.../
Done. The translator converts all =~ to ~~, and can optionally use the :Perl5 modifier or try to translate the internals. Modifers (i, g, etc) are moved to the front (:i, :g) as approriate. In any case, remods are dropped after an attempt to translate them, since they'll only cause problems in Perl 6.

- foreach my $foo (@baz) {...} -> for @baz -> $foo {...}
Done. This is one translation that I am worried about breaking, since from the AST point of view foreach loops can look a lot different. However, in my testing the current code does translate all foreachs.

- length("foo") -> "foo".chars
Done. Fairly easy to handle, and fairly widely tested. At first I wasn't sure about my translation of this, but after wide testing and some revisions, I think it's fairly robust and accurate. It also works with variables instead of constants (length($foo) -> $foo.chars).

- Regular expressions:
m/ a * b /x -> m/ a * b / # /x now default
(space) -> # or \c[SPACE], \x20, \o40,
# <' '>
[abc] -> <[abc]>
[^abc] -> <-[abc]>
(?:...) -> [...]
< > -> \< \>

\p{prop} ->
\P{prop} -> <-prop>

\Qstring\E -> <{ quotemeta 'string' }> # or <'literal string'>

\A -> ^
\z -> $
\Z -> \n?$

\n -> \c[LF] # specifically a linefeed
\r?\n -> \n # logical newline
[^\n] -> \N # not a logical newline

\a -> \c[BEL]
\N{CENT SIGN} -> \c[CENT SIGN]
[^\N{CENT SIGN}] -> \C[CENT SIGN]
\c[ -> \e
\cX -> \c[^X]
[^\t] -> \T
[^\r] -> \R
[^\f] -> \F
[^\e] -> \E
[^\x1B] -> \X1B
[^\x{263a}] -> \X[263a]

\1 -> $1
// -> //
/a|/ -> /a|/

x{2} -> x**{2}
x{2,} -> x**{2...}
x{1,2} -> x**{1..2}
x{1,2}? -> x**{1..2}?

(?=foo) ->
(?!foo) ->
(?<=foo) ->
(?

(?{...}) -> {...}
(??{...}) -> <{...}>

(?>\d*) \d*: # single atom no backtrack
(?>(\d*)) (\d*): # () is still single atom
(?>...) [...]: # multiple atoms need []

s/foo/bar()/e -> s/foo/{ bar() }/
m?foo? -> m:once/foo/
Done. Since from the point of view of the AST a regex is justends up being just text I wrote a text parser that runs over regexs and changes all of these, as well as captures. Some of the more external considerations (?foo? -> m:once/foo/) are not handeled by the parser but by more functions, since from the inside both cases look the same.

- split(m/;/, $foo) -> split(/;/, $foo)
Done. Easy enough to catch, though at first I got carried away and got //m too.

- split(' ', $foo) -> words($foo) or $foo.words or «$foo»
Done. However, at present it always translates to .words, options will be added for the other cases.

- Heredocs:
< qq:to/END/
<<"END" -> qq:to/END/
<<'END' -> q:to/END/
Done. Once I finally had heredoc parsing from the yaml file correctly, this became easy, essentially just text substitution in the right places.

- # Perl 5
require Exporter;
our @ISA = qw<>;
our @EXPORT = qw<>;
sub foo { ... }

# -> Perl 6
sub foo(...) is export { ... }
Mostly Done. At current references to @ISA and @EXPORT (and @EXPORT_OKAY) are not removed, but then again I don't think this is a syntactic problem (they just become normal arrays that are never really used). For clarity they should probably be removed, and I'll try to add this to my next update.

- open()
open my $fh, "<", $filename or die $!; -> my $fh = open($filename, :r) err die $!;

"<" -> :r
">" -> :w
">>" -> :a
'+<" -> :rw

mixing them does not work yet

open(FH, $filename) -> our $fh = open($filename)
my $fh = open($filename)
and possibly:
sub FH () { $fh }
macro FH () { q:code { $fh } }
but $fh generally preferred
From Google's point of view (with my deadline) this one is untouched, however I have since added some of the basics. It's not done by anyone's standard, but to people that don't care about the deadline it's at least started. This is another place where TMTOWTDI makes it harder to translate, since the flexability of Perl5's open makes for many cases to translate.

- print to file
print $fh "Hello\n"; -> print $fh: "Hello\n"; # adding the colon
or
-> $fh.print("Hello\n");
-> $fh.print: "Hello\n";
Done. Currently only the add a "," option is support (which is roughly the same as the ":" from the Howto, but more recent accordin to lwall). A .print option should be added from heavy object-orientation.

- close the file
close($fh); -> the same or $fh.close;
Done. The optional .close is added when the translator is told to do all object oriented options (the -Oo option).

-    reading from a file
$fh.readline
=$fh
Done. At present only .readline is done, but the other option should be supported with time.

In addition, some things no explicitly in the howto are covered. Depending on user preference, 'no strict' can be used or undeclraed variables can be declared.

Overall, many of the most pressing issues for Perl 5 -> Perl 6 translation are covered.
The translator produces useable code in most cases, which can be fed to Pugs. While my work is not over, I think this code will ber very useable for the Perl community, especially when it's been even more extensively tested (I have yet to test on *nix at this point, but Mac OS X (close to *nix) and Windows are fine, given some complications with getting ghc to compile code for OS X).

The thing that worries me the most at present is that the translator still has some problems parsing or running. These errors occur only rarely (~2% or less of test files fail), but this is still not good enough for production software.

I'm proud of my work, but I also think I could have done better. In particular, my Haskell skills at the beginning were a little weak (but I've gotten much better), mostly since my school uses LISP to teach functional program and I had yet to code at this scale in a functional language. I'm also of the opinion that Haskell is not the best language to use for this kind of task, but that's just my personal taste. However, it was the best language to use in this case since Perl 6 isn't quite
to the point of running large, complex pieces of code quickly (though it will surely get there), and writing a perl 5 to perl 6 translator in perl 5 is a little backwards (the idea being that somebody that only has perl 6 can get their perl 5 code translated).

I hope I've done a service for the Perl community, and I'll continue to be as active as a wonky school schedule will allow. Thanks to everybody that helped, especially #perl6 which is like a second brain for those times I make a stupid mistake. It's been a great summer, and I look forward to continuing this work. In fact, I think that's one of the greates parts of this kind of setup: just cause I'm not getting payed for it anymore doesn't mean I can't keep working.





<< Home

This page is powered by Blogger. Isn't yours?