There is a lot of focus on APIs, often REST APIs, but one aspect of API design, and code design that is sometimes missed is scripting. Actually one of the REST design principles is to use mobile code in the right places, but we usually only see that in the situation of Javascript being sent to browser clients.
General purpose scripting languages have been with us for a long time, since Rexx was started in 1979 and Tcl in 1988, followed by Lua in 1993; depending on what you define as a scripting language, as you might include IBM’s Job Control Language, or shells such as ksh. Indeed there is some confusion in the Wikipedia article about what they really are, and the term has been used for all sorts of languages.
A scripting language is a language that is designed to either control an external program or set of programs, providing an easy way to construct compositions of the components being provided by the library or toolkit, or the opposite construction, to be embedded inside another language to control parts of the application in a dynamic way, like Javascript controls rendering in a browser. These are normally referred to as extending and embedding.
The reasons for the two different ways vary, with extending being for use cases like shell scripting where there is a large library of programs with a uniform interface which can be composed to perform a multitude of tasks. The performance of the shell script language is not particularly important is it is mostly just building system level compositions, such as pipes, or making simple conditions and loops. Other examples though are more complex, and the successful general purpose scripting languages have a full set of programming language types and constructs, including first class functions, inheritance and so on; languages like shell scripts and Tcl which only have a single string type staying limited to smaller areas. Essentially these become domain specific languages (DSLs) for working in a paarticular domain. Structuring your code so that it acts as a set of libraries for the domain, while embedding these in a scripting language that does the plumbing is a good way of creating a flexible design, and indeed if you cannot refactor your application as independent libraries with a scripting glue then it is probably not very well designed.
The other way round, embedding, is sometimes unfairly seen as a bad idea, but in the right situations it makes a lot of sense, for example for embedding a query language, or for avoiding server round trips by coordinating a set of commands, such as the way databases embed store procedure languages, or the recent embedding of Lua in Redis.
Why not do everything in one language? The original reasons were that “real” programming languages were statically typed, compiled, and had terrible string handling (yes C, we are looking at you, a language which once had gets), while scripting languages had garbage collection, dynamic everything, interpreted environments with friendly errors, simple string libraries, and were extremely slow, maybe 100 times slower than C. They also used to have fairly poor module structuring, and other facilities for programming in the large. This has not stopped people building large projects using largely scripting languages (Vignette in Tcl being an early example), particularly with the LAMP stack which started off as a simple glue between a web server and a database, but has grown to much larger applications.
What has changed is a gradual convergence, as some of the more friendly features of scripting languages, such as garbage collection and better string libraries started to move into mainstream languages with Java, and the JIT compiler that really started gaining popularity with the JVM has recently been seriously applied to scripting languages, in particular Javascript, Lua and Python, which are making a bid for serious performance. LuaJIT now performs similarly or better than Java in many benchmarks, while PyPy and Javascript are rapidly getting within a small factor of Java. This does not mean that there are not still many places where static memory allocation and the low level guarantees of C are not useful, such as in database design and so on, and of course there are large libraries of existing, well tested software.
Foreign functions
Another gradual change is the development of FFI (foreign function interface) libraries. The original open source libffi has been around since 1996, and Python has had ctypes for a long time too, but there have been issues with these bindings. While they are relatively easy to construct compared to writing a full C binding, they are messy in some languages, although the Ruby ones are fairly readable, a binding to puts in libc being defined with:
module Foo
extend FFI::Library
ffi_lib FFI::Library::LIBC
attach_function :puts, [ :string ], :int
end
The second problem after syntax, was performance. Most ffi bindings were very slow compared to native C bindings, by a factor that was significant for most use cases. However this has started to change too, first with the Rubinius bindings from a few years back, which have a very small overhead, of just a function call which performs the necessary type conversion, and then more recently with the LuaJIT FFI library which not only has a usable syntax, as it has most of a C header file parser built in, it also is natively understood by the JIT compiler so it can generate code that has no overhead at all, actually less than the standard C Lua bindings.
So the following simple program that mainly executes a fast (virtual) system call:
local ffi = require "ffi"
ffi.cdef[[
struct timeval {
long tv_sec;
long tv_usec;
};
int gettimeofday(struct timeval *tv, void *tz);
]]
local tv = ffi.new("struct timeval")
for i = 1,100000000 do
ffi.C.gettimeofday(tv, nil)
if tv.tv_usec == 0 then print "." end
end
generates the following assembly for the inner loop, where the call r12 is a direct call to the libc syscall wrapper:
->LOOP:
394cffd0 mov rdi, [rsp+0x8]
394cffd5 xor esi, esi
394cffd7 call r12
394cffda cmp qword [rbx+0x10], +0x00
394cffdf jz 0x394c0024 ->5
394cffe5 add ebp, +0x01
394cffe8 cmp ebp, 0x05f5e100
394cffee jle 0x394cffd0 ->LOOP
394cfff0 jmp 0x394c0028 ->6
And that runs marginally faster than the C equivalent, showing the advantages of an FFI library that is natively understood by the JIT compiler, as well as of course the analysis that goes behind making sure this is a valid optimisation, such as being able to register allocate the loop variable. It is also nice to see a scripting language generating nice assembler! LuaJIT is currently the fastest dynamic language available by a large margin.
So are there any disadvantages to a well designed FFI interface? Well, having been using the LuaJIT one for a while, the issues are mostly with how some C code is written. A lot of C code is not well written, well encapsualted code. The ABI may depend on all sorts of conditions, not just the architecture, but also the build options of the program, all wrapped in #ifdef conditionals. Macros are used a lot, sometimes generating code for runtime, which of course then has to be reimplemented in the scripting language. The preprocessor is overused, with people rarely using enum instead, and C enums have odd semantics, as they are always cast ints. Scripting languages rareky if ever use stack allocation, while C libraries assume that it is often the norm, so libraries do not hide their internal structures with void * pointers that they heap allocate, which would often be much easier. Also C libraries have historically been written to support old versions of C, so without variable length arrays, and without the sized integer types such as uint32 etc. So it can get messy to interface with, requiring a lot of testing and extra code wrappers to make things work well. Best to stick to well designed code if at all possible.
Structuring scriptable programs
The easiest way to write code that is friendly to being scripted is if most of the code is structured as a set of libraries, exposing clean operations and with clear semantics for allocation and deallocation; a good way to write code anyway. For maximum portability, C is easier to interface with than C++, due to name mangling and C++ exceptions, as well as the fact that you may be interfacing to a language that does not really do object orientation in the way you might use it in C++. C++ exceptions have marginal support in FFI interfaces, and explicit error handling at the external interfaces is more easily usable. Don’t use thread local return values, like errno either. Callbacks in the same scripting VM may also be a problem in some environments, that is calls from the scripting language to C then back to a script callback. Generally owning the event loop in a library is annoying, and this is a case of that to some extent, it is easier in that case if you just call into user code, as in Node.js. These are generally sensible organisational principles for code anyway, so it should be possible to script any well written code.
Note I didnt really mention Java and .Net here. Most people largely only interface within their own runtime. While there are JCM and .Net versions of most programming languages, they are less well supported in general, and especially in the Java case very slow, often similar to the native interpreter, or a little faster but not as fast as native JIT compilers; although there have been recent improvements, the JVM is not currently friendly to dynamic languages. It is of course possible to expose a C API to Java code, through JNI, as it works both ways, to allow Java to be scripted from a non JVM scripting language, although it seems to be less common, and it also has heavier performance costs than the usual C to non-managed scripting language boundary. Of course the big advantage of staying within these frameworks is that calling other languages say within the JVM is very easy, as the runtime understands the calling conventions, so interoperability is very simple, so many Java programs offer Rhino scripting say, but it will be slow. Instead there are statically typed but more appropriate languages for DSLs that can be used, such as Scala or Clojure.
For calling into user code, for example as in the Redis scripting API, or the older but similar example of database stored procedure languages, another case where more standard scripting languages are now available such as Lua in Postgres, the aim is to make the exposed operations easy to work with, and to allow use of native language features, such as how iterators work, and to use coroutines if that is appropriate, as for example in the Lua embedding in Nginx which uses coroutines so that apparently synchronous code can be run asynchronously. The use cases are many, one to improve APIs so that you minimise round trips and moving what would be client side operations to the server, sometimes to implement atomic operations that could not be specified over an API in a performant way, as with stored procedures in a database. Another use is the Node.js case, to embed a very well known and usable language in some low level code (the asynchronous native library), an environment that would otherwise only be availabe in C. Many large programs are actually structured largely as scripting language layers, for example over 40% of Adobe Lightroom code is written in Lua, and much of Firefox is written in Javascript.
Summarising
There has been a huge effort in making scripting languages perform well. Performing well while interfacing in a really easy way to external code is another big step to making scripting a default part of the design of the majority of large scale code, as well as to help integrate the huge installed and tested codebases already out there. Scripting languages generally do not have an imperative to avoid externally programmed core code, unlike say Java, for compatibility reasons, or Go for simplicity reasons. This makes them excellent glue languages, combined with dynamic typing that tends to allow easy modification. If scripting languages were as fast as say C, there are still people who prefer to use statically typed languages with more deterministic compilation and potentially runtime guarantees, and some sorts of libraries are more likely to be available on some languages than others, skewing the choices. But if you are writing complex memory management code in C++, it could be time to switch parts of the code with unclear lifetimes to a scripting language. Mixing languages has never been easier, or more compelling.
