Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,670
6,211
As some of you know, I'm in the process of designing/implementing my own language right now (and unless I'm mistaken, I believe there's two others on these forums also doing similar things). I'm curious, what kinds of features have you seen that you would want to see in a new language, and what kinds of tasks do you find tedious with existing languages that you feel should be easier? What kinds of features should definitely not be present?

Syntax is not a feature. My language is stored as nothing more than abstract syntax trees (it looks like some especially cryptic lisp if you open a file in a text editor which doesn't recognize my language). Users define what the syntax should be, and the AST is displayed with that syntax by their editor. When they save, it's converted back from their preferred syntax to the AST. This should put an end to pointless style and syntax fights and force people to think about the actual features, algorithms, and data structures.

Some ideas I have:
- Only for-in loops, no for loops. For loops have the potential for logic mistakes to be made and no actual benefit over for-in loops.

- Default values for parameters.

- Parameters can be passed by position or keyword.

- Functions can also take flags, in addition to parameters. This is something frequently seen in command line programs but I've never seen it any of the languages I'm familiar with, so I'll offer a quick example of what I have in mind (with Python style syntax):

Code:
# Defining a function which takes a parameter (defaults to 0) and a flag.
def getCoordinates(offset: 0, :2D|3D):
    if 2D:
        return x, y
    else:
        return x, y, z

# Calling the function defined above:
x, y: getCoordinates(offset: 10, :2D)

- All code must go in functions - you can't have code outside of functions.

- All functions must have some documentation specifying the intended input, the output, exceptions that could occur, etc.

- No mandatory functions (IE, "main" in C and Java). Instead, when run your code from the command line, you specify the function you're running, and you pass your parameters in exactly as if you were calling the function from within the language. This does away with argc/argv and the nonsense of having to parse them and write special usage messages and help messages and whatnot. If the user passes bad parameters, they're automatically shown the documentation for the function.

- Identifiers are constants by default - must explicitly be made variables.

- The language is interpreted, with checks of your code done after it's changed to AST but before it's written to disk. If you don't have valid code, it doesn't save (this is in contrast to Python, where you don't find about mistakes you've made until after you're already running the code. An IDE could point the issues out to you, but I'm under the impression that most Python code is written in a text editor - similarly, I'd like my language written in your editor of choice, so the plugin which does the text -> AST conversion will do the checking.)

- Named return values. Python allows for multiple return values, but it's confusing what each value is. An example of what I have in mind:

Code:
x: x, y: y: getCoordinates(:2D)

Sets the local constants x and y to the returned values named x and y. For extra clarity about which is which, assignment always goes from right to left, so:

Code:
localX: returnedX, localY: returnedY: getCoordinates(:2D)

Of course you could wonder, why isn't a datastructure being used? That would be the proper solution in most cases - indeed, in the above example, you would probably have some type Coordinate or Point which stores X and Y, but I just shared that for simplicity. I guess a better example might be something like:

Code:
files: files, folders: folders: listContents(path)

Hopefully you can imagine some scenarios where this is a good idea.

Anyways... enough from me. I'm interested in hearing what kinds of features you've come across (or have never seen) that you'd want a new language to have. I'm also hearing what you think about my ideas.
 
- Functions can also take flags, in addition to parameters. This is something frequently seen in command line programs but I've never seen it any of the languages I'm familiar with, so I'll offer a quick example of what I have in mind (with Python style syntax):

This is quite common in C APIs, for example the open() system call. As an example, to open a file as write only, and append on each write:

Code:
open(path, O_WRONLY | O_APPEND);
 
This is quite common in C APIs, for example the open() system call. As an example, to open a file as write only, and append on each write:

Code:
open(path, O_WRONLY | O_APPEND);

That's actually the flaw that I'm looking to do away with. It's difficult to find out what valid bit flags are, or valid combinations of flags. It's not uncommon for people to abuse the fact that it's nothing more than an integer, and so misuse it and just pass an integer instead (to the confusion of future maintainers, or people learning by example.)

In my language, you would instead have:

Code:
def open(path, :Read|Append|Write)

There are three valid things you can pass in to that parameter. Read, Append, or Write. If you try to run it from the command line with anything else, it'll complain and give you documentation on what valid flags are. If you try to save a file where you've used it incorrectly, it won't save (or maybe it will save, but as a text file which can't be executed.)

No magic number as in C. No magic string as in Python.
 
I'm a bit confused by your list of features. Are those features of the underlying AST (or its interpreter), or of the "facade" language that the ASTs are translated to/from?

I'm asking because if those are AST features, then some of them seem redundant. That is, you could translate one form into another form, so only a single form is needed in the underlying AST.

For example:
- Parameters can be passed by position or keyword.
If the AST only passes parameters by keyword, then "pass by position" can be emulated perfectly, simply by generating keywords based on position index.

For example, if the Facade language only has "pass by position", then the AST can generate keywords for the parameters in a form like "__1", "__2", etc. and assign them according to position. This would be completely unambiguous, so no errors or ambiguity would be introduced by doing so.

Similarly, if the original Facade language only has "pass by keyword", then it can use the original order of keywords entered by the author as the "by position" index values.

So it seems to me that if you have the "by keyword" feature, you don't actually need a "by position" feature in the AST. You can simply map one to the other and it will always work. In other words, an overt "by position" feature would be superfluous.
 
That's actually the flaw that I'm looking to do away with. It's difficult to find out what valid bit flags are, or valid combinations of flags. It's not uncommon for people to abuse the fact that it's nothing more than an integer, and so misuse it and just pass an integer instead (to the confusion of future maintainers, or people learning by example.)

Yes, but that's not really related to flags per se but to the type system (in this case though you would need to add the check inside the function). In Ada for example, you can define new types and after that you can only assign values of the same type, everything else is an error, that also goes for valid ranges for a type.
 
Last edited:
I'm a bit confused by your list of features. Are those features of the underlying AST (or its interpreter), or of the "facade" language that the ASTs are translated to/from?

I'm asking because if those are AST features, then some of them seem redundant. That is, you could translate one form into another form, so only a single form is needed in the underlying AST.

For example:
- Parameters can be passed by position or keyword.
If the AST only passes parameters by keyword, then "pass by position" can be emulated perfectly, simply by generating keywords based on position index.

For example, if the Facade language only has "pass by position", then the AST can generate keywords for the parameters in a form like "__1", "__2", etc. and assign them according to position. This would be completely unambiguous, so no errors or ambiguity would be introduced by doing so.

Similarly, if the original Facade language only has "pass by keyword", then it can use the original order of keywords entered by the author as the "by position" index values.

So it seems to me that if you have the "by keyword" feature, you don't actually need a "by position" feature in the AST. You can simply map one to the other and it will always work. In other words, an overt "by position" feature would be superfluous.

The feature set should be identical between the AST and user facing languages. I'm not aiming for a JBC kind of thing, where decompilation kind-of-sort-of-sometimes works. There should be a one-to-one relationship between the AST language and the user facing languages. When you open your code, you should see the exact same thing you saved (which means your syntax has to be followed exactly. But you can't complain about it. Because it's your syntax. If you don't like your syntax, you should change your syntax. This is in contrast to a C derived language. Where does whitespace go? Wherever you freaking want - they're valid everywhere and you'll probably piss at least a few people off no matter where you put them. Braces around single line ifs? Semicolons in javascript? Unlike my examples, the syntax isn't flexible. You follow your syntax exactly, thus ensuring the one-to-one relationship between the code formatted for you and the actual saved AST.)
 
Being forced to install a special plugin, or use a certain editor/IDE, in order to use the language :)

Ha, touché. Unfortunately there aren't many languages you can use out of the box in OS X anymore. Ruby, Python, AppleScript, and Bash... I think anything else requires at least a single install beyond OS X. It's better than Windows, at least, where the only language that works on a fresh install is Batch/cmd.

I vow to make the install process as quick and painless as possible.
 
Ha, touché. Unfortunately there aren't many languages you can use out of the box in OS X anymore. Ruby, Python, AppleScript, and Bash... I think anything else requires at least a single install beyond OS X. ...

Off the top of my head, you left out: awk, perl, ksh, csh, tcsh.

I've probably missed a few.**



** (I intentionally omitted php, because... well, it's php.)
 
Ha, touché. Unfortunately there aren't many languages you can use out of the box in OS X anymore. Ruby, Python, AppleScript, and Bash... I think anything else requires at least a single install beyond OS X. It's better than Windows, at least, where the only language that works on a fresh install is Batch/cmd.

I vow to make the install process as quick and painless as possible.

Windows has powershell and VBscripting.
 
Ha, touché. Unfortunately there aren't many languages you can use out of the box in OS X anymore. Ruby, Python, AppleScript, and Bash...

Besides Perl, awk, etc. doesn't the included emacs include a Lisp interpreter? Someone may have demonstrated a Turing-complete vi macro language.

Also out of the box, Safari includes a Javascript interpreter that will run just fine off-line. And once online, one can find a bunch of remote programming editors and environments that don't require any installation.
 
Here are a few cherry-picked features from other languages I like:

  • I'd like to have the option of enforcing pass-by-value, especially for arrays and other structures. Specifically, if the function was marked as such, the compiler would make a deep copy of all parameters before executing the function body. This way, with a simple keyword, one could do functional style programming as desired. In support of this your arrays, hashes, etc. should have support for map, reduce, etc
  • I don't know how python does it, but I like the near-effortless reflection of objects built into ruby. I also like methods as messages (like smalltalk) that ruby does.
  • Related to above, you should make the syntax/overhead for closures and passing methods and parameters easy.

I'm dubious about your requirement that all functions have documentation. Documentation is good, but I think making it a requirement isn't going to solve anything. Those that want to document will do so, and those that don't will add the bare minimum garbage text necessary to make the compiler stop complaining. I typically document, but if I'm doing a one-off script to just get something done, I dont want to be bothered by it. What I think you should do instead is enable those that want to document. Baking in support for documentation, like 'man' or ant, or c# style comments and having a good formatting/display/linking system.
 
Graphical programming

(A little cheeky)

Writing imperative code with words is sooo last century. Details in syntax is not really any large step. What I would like to have is a functional programming language done with pictures.

There are examples already on parts. Say, if you design a GUI it is possible to design it with pictures.

// Gunnar
 
One concept that could address the issue of functions being able to return multiple values is read/write parameters. Sounded like heresy the first time I heard it, but a definition like
Code:
  Def get_coords (objectref readonly int, x readwrite int, y readwrite int) return Boolean
      if objectref > 0 -- or any other check here
        x = objectref.xCord
        y = objectref.yCord
        return True
    else
        return False

You could dress it up a bit, decide what to do with the readwrite variables if the function returns a false, but you get the picture
 
(A little cheeky)

Writing imperative code with words is sooo last century. Details in syntax is not really any large step. What I would like to have is a functional programming language done with pictures.

There are examples already on parts. Say, if you design a GUI it is possible to design it with pictures.

// Gunnar

That is part of the idea behind storing an AST rather than storing it with user facing syntax. If users want to mix in some kind of flowchart, like Scratch, or GUI like Interface Builder, or Relational Diagram, like Xcode's Core Data editor, all of that is possible. Editors made specifically for my language should be capable of accepting/displaying graphics rather than only text (as existing text editors with plugins will have to do.)
 
One concept that could address the issue of functions being able to return multiple values is read/write parameters. Sounded like heresy the first time I heard it, but a definition like
Code:
  Def get_coords (objectref readonly int, x readwrite int, y readwrite int) return Boolean
      if objectref > 0 -- or any other check here
        x = objectref.xCord
        y = objectref.yCord
        return True
    else
        return False

You could dress it up a bit, decide what to do with the readwrite variables if the function returns a false, but you get the picture

You mean like

Code:
func swapTwoInts(inout a: Int, inout b: Int) {
    let temporaryA = a
    a = b
    b = temporaryA
}

or even going back to good old fashioned and honest C

Code:
void swap(int *a, int *b)
{
   int temp;
 
   temp = *b;
   *b   = *a;
   *a   = temp;   
}

or, even crazier, fast forward to Swift

Code:
func minMax(array: [Int]) -> (min: Int, max: Int) {
    var currentMin = array[0]
    var currentMax = array[0]
    for value in array[1..<array.count] {
        if value < currentMin {
            currentMin = value
        } else if value > currentMax {
            currentMax = value
        }
    }
    return (currentMin, currentMax)
}

You can even mark min and max as inout and manipulate their values in the function as well.

Well, that would certainly be novel. Maybe I didn't understand your post, but your idea can be done in C, if we assume that "objectref" is a struct. It sounds to me like you heard about passing in stuff by reference and you thought it was heresy?
 
Last edited:
Personally, I wouldn't have interest in a new language that offered only syntactic improvements over existing one.

But I like the idea of each programmer using the Syntax she or he prefers a lot.

Do you have any interest in applying this same idea to existing languages?

For example, let's say I'm sharing source files for a JavaScript with another developer but we are both adamant about our own, personal code styles and they conflict.

When I read a JavaScript file, it is parsed to AST then rendered back to source code, but using my own personal coding style, syntax extensions, etc. -- whatever you support.

Then I code according to me personal variant of JavaScript.

When I commit changes, my personal variant of JavaScript is probably parsed back to AST (In the process my changes are validated to ensure I really followed my syntax.)

This opens up the possibility of custom syntax extensions to other languages, which is pretty cool and could be a powerful abstraction tool.


Also: I think it's good to support having the transformations between personal syntax and AST to occur when accessing code from Source Code control. (This is true whether you like my idea to support existing languages or still do your own language.)

If you're using source control (and you should especially since we're talking about sharing source code), this ensures the code is in your personal syntax as soon as possible, so all the source editing/viewing tools you might want to use will see it that way. E.g., I typically use a couple source editors for different reasons and often use various diff utilities as well. If the code comes right out of source control already in my personal syntax then all of these will work.

Likewise, committing changes to source control seems like the right time to insist that the source code be valid.
 
Personally, I wouldn't have interest in a new language that offered only syntactic improvements over existing one.

But I like the idea of each programmer using the Syntax she or he prefers a lot.

Do you have any interest in applying this same idea to existing languages?

For example, let's say I'm sharing source files for a JavaScript with another developer but we are both adamant about our own, personal code styles and they conflict.

When I read a JavaScript file, it is parsed to AST then rendered back to source code, but using my own personal coding style, syntax extensions, etc. -- whatever you support.

Then I code according to me personal variant of JavaScript.

When I commit changes, my personal variant of JavaScript is probably parsed back to AST (In the process my changes are validated to ensure I really followed my syntax.)

This opens up the possibility of custom syntax extensions to other languages, which is pretty cool and could be a powerful abstraction tool.


Also: I think it's good to support having the transformations between personal syntax and AST to occur when accessing code from Source Code control. (This is true whether you like my idea to support existing languages or still do your own language.)

If you're using source control (and you should especially since we're talking about sharing source code), this ensures the code is in your personal syntax as soon as possible, so all the source editing/viewing tools you might want to use will see it that way. E.g., I typically use a couple source editors for different reasons and often use various diff utilities as well. If the code comes right out of source control already in my personal syntax then all of these will work.

Likewise, committing changes to source control seems like the right time to insist that the source code be valid.

Call me old and a sceptic, but this just sounds like a recipe for disaster.
 
Ha, touché. Unfortunately there aren't many languages you can use out of the box in OS X anymore. Ruby, Python, AppleScript, and Bash... I think anything else requires at least a single install beyond OS X. It's better than Windows, at least, where the only language that works on a fresh install is Batch/cmd.

I vow to make the install process as quick and painless as possible.

My comment was in regards to editing the source code. I don't need to use a special plugin or editor to edit Ruby, Python, Bash, etc. You're requiring people to use a supported editor in order to do something as simple as edit a text file, which I find to be a big drawback to the language.

Honestly, I don't really see the need for a "facade" language layer. I've never worked for a company or worked on a project where there was so much fighting over coding style that would warrant such a feature. The person/people in charge decide on the coding style and lay down the law. Sure, there's always some new person who thinks they know better but it was always easy to handle... "this is our coding style, your opinion doesn't matter".

I've always been somewhat fascinated by the Ruby community in that there are rarely arguments over coding style, or at least I've never been involved in any. Everyone knows that, for example, you use 2-space indents when coding in Ruby. Plus, the language forces some styles on you, which prevents even more arguments. You just accept that's the way it is and don't think about it.
 
Call me old and a sceptic, but this just sounds like a recipe for disaster.

It certainly has the potential for abuse.

I haven't puzzled out yet whether I think it *encourages* abuse vs. just allowing it. Any powerful form of expression allows abuse but that doesn't mean its not worthwhile.

Source code transformations have the potential to corrupt the program -- they must be 100%, one-to-one or it's useless for any practical purpose. That's hard but not necessarily too hard.

With that in place, at least the differences in the code between different syntaxes can't break anything.

I don't know how far it can be taken though. If you were to only use this for something like indentation, style of braces, and wrapping in comments, then no problem. There's no actual change in the structure of the code.

Small changes in structure, like infix vs. prefix, or differences in operator precedence or how logical expressions are factored also seem OK.

I wonder if this could somehow work for bigger structural changes... like shallower class hierarchies vs deeper ones or even somehow across common paradigms. Probably not, but it would be kind of interesting to see how far automatic refactoring tools could take it.
 
Personally, I wouldn't have interest in a new language that offered only syntactic improvements over existing one.

But I like the idea of each programmer using the Syntax she or he prefers a lot.

Do you have any interest in applying this same idea to existing languages?

For example, let's say I'm sharing source files for a JavaScript with another developer but we are both adamant about our own, personal code styles and they conflict.

When I read a JavaScript file, it is parsed to AST then rendered back to source code, but using my own personal coding style, syntax extensions, etc. -- whatever you support.

Then I code according to me personal variant of JavaScript.

When I commit changes, my personal variant of JavaScript is probably parsed back to AST (In the process my changes are validated to ensure I really followed my syntax.)

This opens up the possibility of custom syntax extensions to other languages, which is pretty cool and could be a powerful abstraction tool.


Also: I think it's good to support having the transformations between personal syntax and AST to occur when accessing code from Source Code control. (This is true whether you like my idea to support existing languages or still do your own language.)

If you're using source control (and you should especially since we're talking about sharing source code), this ensures the code is in your personal syntax as soon as possible, so all the source editing/viewing tools you might want to use will see it that way. E.g., I typically use a couple source editors for different reasons and often use various diff utilities as well. If the code comes right out of source control already in my personal syntax then all of these will work.

Likewise, committing changes to source control seems like the right time to insist that the source code be valid.

I considered the possibility of taking this idea and applying it to a existing language, but I decided against that because I have so many other ideas for a new language. I figure if I'm going through the effort of making this AST <-> Language tool, I may as well make it a new language with all the features I want. Consider the fact that most of the languages you're using are probably 20+ years old - there's been a lot of great ideas since that haven't found there way into mainstream languages yet.

You're exactly right regarding how great this could be for source control. It's always obnoxious when syntax changes clutter a diff.
 
Last edited:
Bug free code - guaranteed

Now, a really needed improvement. Code that is guaranteed bug free once it is passed by the compiler. I guess you would need to rethink quite a lot in order to go there.

But, simply as an example of the idea. Let us assume we write a function to return the mean value.

function mean ( array of values )
return sum (value) / number of (values)

The compiler would then complain: a empty array would return division by zero.

So we add something like a pragma: never call this function with an empty array.

// gunnar
 
Now, a really needed improvement. Code that is guaranteed bug free once it is passed by the compiler. I guess you would need to rethink quite a lot in order to go there.

But, simply as an example of the idea. Let us assume we write a function to return the mean value.

function mean ( array of values )
return sum (value) / number of (values)

The compiler would then complain: a empty array would return division by zero.

So we add something like a pragma: never call this function with an empty array.

// gunnar

Yes, I intend to allow for static analysis after the transformation to AST but before the save to disk.

An idea I just had:
When modifying an existing function where you aren't fixing it, it could do some random testing on it, where it passed random values to both the prior implementation and the new one, and let you know if return values differ at all. That could catch regressions that would otherwise go unnoticed.
 
Now, a really needed improvement. Code that is guaranteed bug free once it is passed by the compiler.

A lot of academic research has gone into mathematically provable formal verification of bug free code. It's rarely ever used in practice (outside of a few nuclear, medical and military projects) because the cost of a writing a complete and tight enough specification for a real application that the code can be formally verified against is magnitudes greater than all the coding and debugging of any typical consumer or business app. No one can afford to do it (cost or timewise).
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.