Mini languages

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Mini languages

KenSeehart
I want to implement a simple language translator (a superset of C++), but I haven't done anything of this kind since college (a couple decades ago).  I would like to use python tools as much as possible.

NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++ compiler (gcc)] -> object code

The translator needs to be able to parse special non-c++ syntax to generate c++, and leave other code intact.

More documentation on my project here:
  http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st

I could almost get by with just a python script using regular expressions (my grammar is simple enough), but I need to know a certain amount of contextual information. E.g., if I am parsing "cellclass mycell { ... }", the contents between the braces must be processed accordingly.  This means I have to know when I reach the closing brace (which I can't do with regular expressions).  However, I'm sure I could do a prototype this way, using the assumption that the a closing brace on a class matches "^};", but that would be just plain sloppy :-)

So I think one way or another I'm stuck with implementing an LALR parser.

I'm wondering if there is anyone in this community with experience doing this kind of thing.

- Ken Seehart


_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages

Dennis Allison

Ken --

"a simple language translator" is not comensurate with "a superset of C++"

Some might say it's an oxymoron.

It looks like what you are wanting to do is to build a little lanaague
that you parse and output C++ object code.  I do that sort of thing all
the time; and I am sure I am not alone.   There are a bunch of different
parsing tools available in python, but if the language really is simple,
I'd go for a recursive descent parser written in python; if it is
expression heavy, you might want to do expressions by some bottom-up
scheme.  Trees make a nice intermediate form.  The big question is what
you want to do in terms of optimization.  With a C++ target, I suppose
that you leave that to gcc.

You might want to reconsider your approach and use python as the language
and C++ extensions to do the heaavy lifting, if necessary.  You could, of
course, make custom extensions for specific problems as an optimization.




On Sun, 7 May 2006, Ken Seehart wrote:

> I want to implement a simple language translator (a superset of C++),
> but I haven't done anything of this kind since college (a couple decades
> ago).  I would like to use python tools as much as possible.
>
> NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++
> compiler (gcc)] -> object code
>
> The translator needs to be able to parse special non-c++ syntax to
> generate c++, and leave other code intact.
>
> More documentation on my project here:
>  
> http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st
>
> I could almost get by with just a python script using regular
> expressions (my grammar is simple enough), but I need to know a certain
> amount of contextual information. E.g., if I am parsing "cellclass
> mycell { ... }", the contents between the braces must be processed
> accordingly.  This means I have to know when I reach the closing brace
> (which I can't do with regular expressions).  However, I'm sure I could
> do a prototype this way, using the assumption that the a closing brace
> on a class matches "^};", but that would be just plain sloppy :-)
>
> So I think one way or another I'm stuck with implementing an LALR parser.
>
> I'm wondering if there is anyone in this community with experience doing
> this kind of thing.
>
> - Ken Seehart
>
>

--

_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages

Dennis Reinhardt
In reply to this post by KenSeehart
At 11:38 AM 5/7/2006, Ken Seehart wrote:
>This means I have to know when I reach the closing brace (which I can't do
>with regular expressions).  However, I'm sure I could do a prototype this
>way, using the assumption that the a closing brace on a class matches
>"^};", but that would be just plain sloppy :-)

I don't know your syntax but it sounds like you (1) know when to expect
braces.  I am further guessing that you have (2) a single level of
braces.  The routine below would work under these assumptions.  I have
implemented a self-compiler prior to working with Python which met those
assumptions.  If the assumptions could not be met, I would be inclined to
use LALR but I have not direct experience.  Rather, I designed the syntax
to not require LALR.  I have had some success in parsing under Python using
RE with the following code:

import sre, string
#separate html into 5 components based on regex  case insensitive, dotall.
#  Input regex should use (?:...) grouping, if any
#  regex may not be compiled since we compile it here
def regex_sep(str, regex1, regex2):
     left = lm = mid = rm = right = ""  # return matched regex
     flags = "(?is)"
     re1 = sre.compile("(%s)%s" % (regex1, flags))
     match1 = re1.search(str)
     if match1:
         lm = match1.group(1)
         left, rest = split2(str, lm)
         re2 = sre.compile("(%s)%s" % (regex2, flags))
         match2 = re2.search(rest)
         if match2:
             rm = match2.group(1)
             mid, right = split2(rest, rm)
         else:
             mid = rest
     else:
         left = str
     return left, lm, mid, rm, right

def split2(str, pattern):
     left = str
     right = ""
     try:
         splitlen = len(string.split(str, pattern, 1))
         if splitlen == 2:
             left, right = string.split(str, pattern, 1)
     except:
         pass
     return (left, right)


The call

         x1,x2,x3,x4,x5 = regex_sep(input_str, "{", "}")

would separate input_str into 5 components

         x1 = text prior to first regex match
            = input_str if no match and others ""
         x2 = text which matched the first regex (trivially "{" here)
         x3 = text between matched regex
         x4 = text which matched second regex (trivially "}" here)
         x5 = text following second regex match

Regards, Dennis


----------------------------------
| Dennis    | [hidden email]   |
| Reinhardt | Powerful Anti-Spam |
----------------------------------

_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages

Dennis Allison


IMHO using regular expressions to parse anything complicated is a bad
idea.  Splitting off the lexical processing (always ugly) from the
semantics (can be clean) is always a win.


On Sun, 7 May 2006, Dennis Reinhardt wrote:

> At 11:38 AM 5/7/2006, Ken Seehart wrote:
> >This means I have to know when I reach the closing brace (which I can't do
> >with regular expressions).  However, I'm sure I could do a prototype this
> >way, using the assumption that the a closing brace on a class matches
> >"^};", but that would be just plain sloppy :-)
>
> I don't know your syntax but it sounds like you (1) know when to expect
> braces.  I am further guessing that you have (2) a single level of
> braces.  The routine below would work under these assumptions.  I have
> implemented a self-compiler prior to working with Python which met those
> assumptions.  If the assumptions could not be met, I would be inclined to
> use LALR but I have not direct experience.  Rather, I designed the syntax
> to not require LALR.  I have had some success in parsing under Python using
> RE with the following code:
>
> import sre, string
> #separate html into 5 components based on regex  case insensitive, dotall.
> #  Input regex should use (?:...) grouping, if any
> #  regex may not be compiled since we compile it here
> def regex_sep(str, regex1, regex2):
>      left = lm = mid = rm = right = ""  # return matched regex
>      flags = "(?is)"
>      re1 = sre.compile("(%s)%s" % (regex1, flags))
>      match1 = re1.search(str)
>      if match1:
>          lm = match1.group(1)
>          left, rest = split2(str, lm)
>          re2 = sre.compile("(%s)%s" % (regex2, flags))
>          match2 = re2.search(rest)
>          if match2:
>              rm = match2.group(1)
>              mid, right = split2(rest, rm)
>          else:
>              mid = rest
>      else:
>          left = str
>      return left, lm, mid, rm, right
>
> def split2(str, pattern):
>      left = str
>      right = ""
>      try:
>          splitlen = len(string.split(str, pattern, 1))
>          if splitlen == 2:
>              left, right = string.split(str, pattern, 1)
>      except:
>          pass
>      return (left, right)
>
>
> The call
>
>          x1,x2,x3,x4,x5 = regex_sep(input_str, "{", "}")
>
> would separate input_str into 5 components
>
>          x1 = text prior to first regex match
>             = input_str if no match and others ""
>          x2 = text which matched the first regex (trivially "{" here)
>          x3 = text between matched regex
>          x4 = text which matched second regex (trivially "}" here)
>          x5 = text following second regex match
>
> Regards, Dennis
>
>
> ----------------------------------
> | Dennis    | [hidden email]   |
> | Reinhardt | Powerful Anti-Spam |
> ----------------------------------
>
> _______________________________________________
> Baypiggies mailing list
> [hidden email]
> http://mail.python.org/mailman/listinfo/baypiggies
>

--

_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages

KenSeehart
In reply to this post by Dennis Allison
Dennis Allison wrote:
Ken -- 

"a simple language translator" is not comensurate with "a superset of C++"
  
Some might say it's an oxymoron.
  
What I mean is that my language adds just a few elements to C++, and these elements are so simple that
they can be easily parsed with regular expressions.

... c++ code ...
cellclass mycell {
    (stuff that I need to parse)
   cfn (more stuff that I need to parse)   {
      ... c++ code ...
   }
 
...
}
... more c++ code ...
...

The "stuff that I need to parse" can be done with trivially easy regular expressions (no recursion).
The C++ code will be left unmodified (and I do not need to extract any information from the C++ code).

So, while my language is technically a superset of C++, I do not intend to implement a C++ parser :-)
For details, see the link...
  http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st

It looks like what you are wanting to do is to build a little lanaague 
that you parse and output C++ object code.  I do that sort of thing all 
the time; and I am sure I am not alone.   There are a bunch of different 
parsing tools available in python, but if the language really is simple, 
I'd go for a recursive descent parser written in python; if it is 
expression heavy, you might want to do expressions by some bottom-up
scheme.  Trees make a nice intermediate form.  The big question is what 
you want to do in terms of optimization.  With a C++ target, I suppose 
that you leave that to gcc.
  
Well, not exactly...  The input is C++ with a little extra non-c++ syntax added, and the output is pure
C++.
You might want to reconsider your approach and use python as the language 
and C++ extensions to do the heaavy lifting, if necessary.  You could, of 
course, make custom extensions for specific problems as an optimization.
  
Thanks, already doing that at a higher level of the program.  All user interface and high-level coding
is in python.

I've decided that a speciallized language is worth having to facilitate rapid development of a large
number special functions that plug into my application in a specific way.  These are not python
extention functions, but are called directly by a core engine.  The function call protocol must optimized
for speed, since millions of these special functions will be executed consecutively.
On Sun, 7 May 2006, Ken Seehart wrote:
  
I want to implement a simple language translator (a superset of C++), 
but I haven't done anything of this kind since college (a couple decades 
ago).  I would like to use python tools as much as possible.

NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++ 
compiler (gcc)] -> object code

The translator needs to be able to parse special non-c++ syntax to 
generate c++, and leave other code intact.

More documentation on my project here:
  
http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st

I could almost get by with just a python script using regular 
expressions (my grammar is simple enough), but I need to know a certain 
amount of contextual information. E.g., if I am parsing "cellclass 
mycell { ... }", the contents between the braces must be processed 
accordingly.  This means I have to know when I reach the closing brace 
(which I can't do with regular expressions).  However, I'm sure I could 
do a prototype this way, using the assumption that the a closing brace 
on a class matches "^};", but that would be just plain sloppy :-)

So I think one way or another I'm stuck with implementing an LALR parser.

I'm wondering if there is anyone in this community with experience doing 
this kind of thing.

- Ken Seehart


    

  


_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages

KenSeehart
In reply to this post by KenSeehart
Never mind, this is really trivial.  Any C++ code that is outside my "cellclass" block is passed on unmodified. The only recursion I have to deal with in my grammar is a C++ function body.  And since I don't actually need to process the function body, I just need to recursively scan for a pair of matching braces (not rocket science).  Everything else is regular expressions.  I definitely don't need an LALR parser for this :-)
 
So I don't have any further questions.

Thanks Dennis and Dennis for your comments.

- Ken

Ken Seehart wrote:
I want to implement a simple language translator (a superset of C++), but I haven't done anything of this kind since college (a couple decades ago).  I would like to use python tools as much as possible.

NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++ compiler (gcc)] -> object code

The translator needs to be able to parse special non-c++ syntax to generate c++, and leave other code intact.

More documentation on my project here:
  http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st

I could almost get by with just a python script using regular expressions (my grammar is simple enough), but I need to know a certain amount of contextual information. E.g., if I am parsing "cellclass mycell { ... }", the contents between the braces must be processed accordingly.  This means I have to know when I reach the closing brace (which I can't do with regular expressions).  However, I'm sure I could do a prototype this way, using the assumption that the a closing brace on a class matches "^};", but that would be just plain sloppy :-)

So I think one way or another I'm stuck with implementing an LALR parser.

I'm wondering if there is anyone in this community with experience doing this kind of thing.

- Ken Seehart


_______________________________________________ Baypiggies mailing list [hidden email] http://mail.python.org/mailman/listinfo/baypiggies


_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages

Shannon -jj Behrens
On 5/7/06, Ken Seehart <[hidden email]> wrote:

>  Never mind, this is really trivial.  Any C++ code that is outside my
> "cellclass" block is passed on unmodified. The only recursion I have to deal
> with in my grammar is a C++ function body.  And since I don't actually need
> to process the function body, I just need to recursively scan for a pair of
> matching braces (not rocket science).  Everything else is regular
> expressions.  I definitely don't need an LALR parser for this :-)
>
>  So I don't have any further questions.
>
>  Thanks Dennis and Dennis for your comments.
>
>  - Ken
>
>  Ken Seehart wrote:
>
>  I want to implement a simple language translator (a superset of C++), but I
> haven't done anything of this kind since college (a couple decades ago).  I
> would like to use python tools as much as possible.
>
>  NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++ compiler
> (gcc)] -> object code
>
>  The translator needs to be able to parse special non-c++ syntax to generate
> c++, and leave other code intact.
>
>  More documentation on my project here:
>
> http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st
>
>  I could almost get by with just a python script using regular expressions
> (my grammar is simple enough), but I need to know a certain amount of
> contextual information. E.g., if I am parsing "cellclass mycell { ... }",
> the contents between the braces must be processed accordingly.  This means I
> have to know when I reach the closing brace (which I can't do with regular
> expressions).  However, I'm sure I could do a prototype this way, using the
> assumption that the a closing brace on a class matches "^};", but that would
> be just plain sloppy :-)
>
>  So I think one way or another I'm stuck with implementing an LALR parser.
>
>  I'm wondering if there is anyone in this community with experience doing
> this kind of thing.
>
>  - Ken Seehart

I once interviewed at a company that had implemented a mini-language
on top of C++ using templates and operator overloading.  Apparently,
they had even overloaded the comma operator.

/me shudders

-jj
_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages - possible short talk in the future?

KenSeehart
Shannon -jj Behrens wrote:
On 5/7/06, Ken Seehart [hidden email] wrote:
 Never mind, this is really trivial.  Any C++ code that is outside my
"cellclass" block is passed on unmodified. The only recursion I have to deal
with in my grammar is a C++ function body.  And since I don't actually need
to process the function body, I just need to recursively scan for a pair of
matching braces (not rocket science).  Everything else is regular
expressions.  I definitely don't need an LALR parser for this :-)

 So I don't have any further questions.

 Thanks Dennis and Dennis for your comments.

 - Ken

 Ken Seehart wrote:

 I want to implement a simple language translator (a superset of C++), but I
haven't done anything of this kind since college (a couple decades ago).  I
would like to use python tools as much as possible.

 NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++ compiler
(gcc)] -> object code

 The translator needs to be able to parse special non-c++ syntax to generate
c++, and leave other code intact.

 More documentation on my project here:

http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st

 I could almost get by with just a python script using regular expressions
(my grammar is simple enough), but I need to know a certain amount of
contextual information. E.g., if I am parsing "cellclass mycell { ... }",
the contents between the braces must be processed accordingly.  This means I
have to know when I reach the closing brace (which I can't do with regular
expressions).  However, I'm sure I could do a prototype this way, using the
assumption that the a closing brace on a class matches "^};", but that would
be just plain sloppy :-)

 So I think one way or another I'm stuck with implementing an LALR parser.

 I'm wondering if there is anyone in this community with experience doing
this kind of thing.

 - Ken Seehart

I once interviewed at a company that had implemented a mini-language
on top of C++ using templates and operator overloading.  Apparently,
they had even overloaded the comma operator.

/me shudders

-jj

Yikes.  Template abuse is pretty scary stuff.  Here's a scary book on the subject,
for those who are interested. : http://boost-consulting.com/mplbook/

Kind of reminds me of people who get way into John Conways "Life", and build
computers and things out of glider guns.

Perhaps a more elegant way to do metaprogramming would be to kind of squish
python into the preprocessor phase.

I recommend making sure you actually have a good reason to do this kind of thing.
Usually more straightforward techniques suffice (e.g. write your program in python
and use extending and embedding).  But good reasons do exist.

Anyway, I have successfully completed my NICL compiler.  It really works!

NICL code -> [preprocessor (cpp)] -> [NICL translator] -> [compiler (gcc)] -> object code

If there is interest, I'd be willing to give a short talk on the topic some time:
  "How to extend the C++ grammar using Python"

The documentation on my language is here:

    http://www.neuralintegrator.com/documentation/ni_book/nicl.st

- Ken Seehart


_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Mini languages - possible short talk in the future?

Shannon -jj Behrens
On 5/10/06, Ken Seehart <[hidden email]> wrote:

>  Shannon -jj Behrens wrote:
> On 5/7/06, Ken Seehart <[hidden email]> wrote:
>
>  Never mind, this is really trivial.  Any C++ code that is outside my
>  "cellclass" block is passed on unmodified. The only recursion I have to
> deal
>  with in my grammar is a C++ function body.  And since I don't actually need
>  to process the function body, I just need to recursively scan for a pair of
>  matching braces (not rocket science).  Everything else is regular
>  expressions.  I definitely don't need an LALR parser for this :-)
>
>   So I don't have any further questions.
>
>   Thanks Dennis and Dennis for your comments.
>
>   - Ken
>
>   Ken Seehart wrote:
>
>   I want to implement a simple language translator (a superset of C++), but
> I
>  haven't done anything of this kind since college (a couple decades ago).  I
>  would like to use python tools as much as possible.
>
>   NICL code -> [C++ preprocessor (gcc)] -> [NICL translator] -> [C++
> compiler
>  (gcc)] -> object code
>
>   The translator needs to be able to parse special non-c++ syntax to
> generate
>  c++, and leave other code intact.
>
>   More documentation on my project here:
>
> http://www.seehart.com/neuralintegrator.com/documentation/ni_book/defining_cell_types.st
>
>   I could almost get by with just a python script using regular expressions
>  (my grammar is simple enough), but I need to know a certain amount of
>  contextual information. E.g., if I am parsing "cellclass mycell { ... }",
>  the contents between the braces must be processed accordingly.  This means
> I
>  have to know when I reach the closing brace (which I can't do with regular
>  expressions).  However, I'm sure I could do a prototype this way, using the
>  assumption that the a closing brace on a class matches "^};", but that
> would
>  be just plain sloppy :-)
>
>   So I think one way or another I'm stuck with implementing an LALR parser.
>
>   I'm wondering if there is anyone in this community with experience doing
>  this kind of thing.
>
>   - Ken Seehart
>
>  I once interviewed at a company that had implemented a mini-language
>  on top of C++ using templates and operator overloading.  Apparently,
>  they had even overloaded the comma operator.
>
>  /me shudders
>
>  -jj
>
>  Yikes.  Template abuse is pretty scary stuff.  Here's a scary book on the
> subject,
>  for those who are interested. :
> http://boost-consulting.com/mplbook/
>
>  Kind of reminds me of people who get way into John Conways "Life", and
> build
>  computers and things out of glider guns.

Heh, I did my senior thesis on Conway's game of "Life", but now we're
*way* off topic ;)

>  Perhaps a more elegant way to do metaprogramming would be to kind of squish
>  python into the preprocessor phase.

Agreed.  Both the QT folks and the GTK folks have special
preprocessors.  The QT one is mandatory, the GTK one is not.

>  I recommend making sure you actually have a good reason to do this kind of
> thing.
>  Usually more straightforward techniques suffice (e.g. write your program in
> python
>  and use extending and embedding).  But good reasons do exist.
>
>  Anyway, I have successfully completed my NICL compiler.  It really works!
>
>  NICL code -> [preprocessor (cpp)] -> [NICL translator] -> [compiler (gcc)]
> -> object code
>
>  If there is interest, I'd be willing to give a short talk on the topic some
> time:
>    "How to extend the C++ grammar using Python"
>
>  The documentation on my language is here:
>
>
> http://www.neuralintegrator.com/documentation/ni_book/nicl.st

Happy Hacking,
-jj
_______________________________________________
Baypiggies mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/baypiggies