Using Terraform to put Kotlin in AWS Lambda
Some questions are just too big to answer. How do you design for a complex, uncertain and ever-evolving cloud world? Reduce the time-horizon, cleave off the bits that just arenāt ready yet and have a clear technical goal, then you can at least redefine the question:
How to use Terraform to automatically provision a static website and scalable dynamic (Kotlin) form processor to store made up data in a DynamoDB table?
That was the question I tried to answer in a proof-of-concept (PoC). Itās been published open source on GitHub. This post goes into a bit more detail on how the proof-of-concept came into being, why different design decisions were made and how the code works.
Background
Software Engineering courses in the mid-90s taught the mechanics of building good projects. In the main, they sought to introduce the software development lifecycle (SDLC) and familiarise students with it. Work out what youāre building, design it, build it, test it, ship. There are problems with this sort of waterfall SDLC, not least of all the bugs and limitations introduced by coding and supporting it all in-house. Perhaps design, build, test, delay, debug, delay, ship, patch?
By the middle of the following decade, the model had turned on its head. Engineers had become curators of code, sharing, selecting and refining an internet full of examples, frameworks, open source libraries, gists and StackOverflow answers to compose a solution. āNot invented hereā became a badge of honour, instead of a digital prejudice that slowed innovation.
The challenge
Traffic spikes
The nature and timing of major sporting events means that many customers place bets at the same time. During those peaks, Sky Betting and Gamingās servers receive many hundreds of requests per second, 445/s during the run up to this yearās Grand National at Aintree. Like almost any large web service, Sky Betting and Gaming need an adaptive and reactive infrastructure that can cope with the spikes, but scale back during the quite times.
Simple form processing and storage
Bets are complex to strike, so for this PoC Iāve simplified the model. Hereās a static HTML form that purports to gather credit card information. Because itās a PoC, it actually doesnāt gather any sensitive data.
<form name="addcard" action="${post_target}" method="post">
<p>Name: <input type="text" name="name" value="" /></p>
<p>Card number: <input class="readonly" type="text" name="number" value="" readonly="readonly" /></p>
<p>Expiry: <input class="readonly" type="text" name="expirymonth" value="" readonly="readonly" /> / <input class="readonly" type="text" name="expiryyear" value="" readonly="readonly" /></p>
<p>CVV: <input class="readonly" type="text" name="cvv" value="" readonly="readonly" /></p>
<input type="submit" name="submit" value="Store" />
</form>
Instead it generates random data and pushes it to the form processor for storing in a database, though the form target isnāt set until we create the form processor using Terraform.
// use JS to make up card details
function makeRandom(id, min, max, format) {
var output = '';
var separator = '-';
// split format up into XX blocks
var chunks = format.split(separator);
// loop through blocks and substitute
for (var i=0 ; i<chunks.length ; ++i) {
// 2nd, 3rd, nth chunk is preceded by a separator
if (i !== 0) {
output += separator;
}
var chunk = chunks[i];
// create a random number and pad to be the correct length
var chnum = pad(Math.floor((Math.random()*(max - min)) + min), chunk.length);
// append number to output
output += '' + chnum;
}
// apply to field
$(id).val(output);
return output;
}
function pad(n, width, z) {
z = z || '0';
n = n + '';
return n.length >= width ? n : new Array(width - n.length + 1).join(z) + n;
}
makeRandom('input[name=number]', 0, 9999, 'XXXX-XXXX-XXXX-XXXX');
makeRandom('input[name=expirymonth]', 01, 12, 'XX');
makeRandom('input[name=expiryyear]', 19, 26, 'XX');
makeRandom('input[name=cvv]', 0, 999, 'XXX');
The landscape
Several forces have shaped the demand for and form of this PoC.
As the pace of technology development has been accelerated, by more people doing it and more of them sharing it openly, great solutions to many of the problems landing on my desk were just āout thereā. To solve problem D, it was no longer necessary to solve problems A, B, and C - the generic (boring) precursors - to get to the good stuff. Coders got to concentrate on the top of the pyramid, the bit that was specific to the core business.
I saw Arun Gupta give a talk at Devoxx UK 2017. This table comes from his well-worth-watching talk entitled Deploy microservice using Amazon Web Services S3, API Gateway, Lambda and Couchbase.
IaaS | CaaS | PaaS | FaaS |
---|---|---|---|
Functions | Functions | Functions | Functions |
Applications | Applications | Applications | Applications |
Runtime | Runtime | Runtime | Runtime |
Containers | Containers | Containers | Containers |
Operating System | Operating System | Operating System | Operating System |
Virtualisation | Virtualisation | Virtualisation | Virtualisation |
Hardware | Hardware | Hardware | Hardware |
Key |
---|
Customer Managed |
Customer Scaled |
Vendor Managed |
Focussing on the value
Arun highlights the change that Iām talking about. Where before we had to solve the hardware problem, in that we had to build and operate hardware to run the apps, there was now a service whose total cost of ownership (TCO) was substantially lower. The same applies to the management of virtualisation layer, operating system, container, run-time environment (JRE). To say itās āsolvedā is a bit of an over-simplification, but the point is that there are managed services in each of these domains that reduce the amount of management we have to do of them.
Serverless computing is another example of a shift that allows us to focus on the top of the stack.
Advances in language tech
Thinking back to those Software Engineering courses from twenty years ago, many taught C, Java and sometimes Haskell. That language selection was pretty cutting edge at itās time. C for memory management, Java for OO design and Haskell for a functional future that we didnāt quite understand yet.
Sky Betting and Gaming has always been progressive in its attitude towards language selection, so anything that lets the engineering teams cut better code - cleaner, clearer, more reliable, more scalable - is welcomed into the supported tech stable. Kotlin is a fantastic divergence from Java so letās use that for the PoC. Itās not one of the AWS Lambda natively supported languages, but as itās JVM-based we can compile a fat JAR and run in the same way as Java.
Automated provisioning
On the infrastructure-as-code (IAC) side, multi-cloud languages like Terraform are making it easier to spin up and glue all the pieces together; and there are a whole lot of pieces. It also means the whole shooting match can be instantiated with a single terraform apply
, rather than an elaborate āopen this console window and tick this optionā tutorial.
The architecture
Gone are the days where you can stick a form handler in a single PHP script and fire requests at it. That AWS Lambda function needs some infrastructure wrapped around it to handle the NFRs.
The proof-of-concept code
All of these factors compound to create the unknowns that this proof-of-concept seeks to better understand. Broadly, what are the performance characteristics associated with using Kotlin-based Lambda functions to dynamically scale to meet demand?
Kotlin form processor
Weāll need a form processor to take the data and write to DynamoDB:
/**
* AWS Lambda handler function
* @param input Map Input fields
* @param context Context Lambda function context
* @return ApiGateWayResponse returned to gateway en route to requesting browser
*/
override fun handleRequest(input: Map<String, Any>, context: Context): ApiGatewayResponse {
LOG.info("received: " + input.keys.toString())
// decode using Spring
val bodystr: String = "http://www.example.com/index.html?" + input.get("body").toString()
val parameters = UriComponentsBuilder.fromUriString(bodystr).build().getQueryParams()
var name: String? = parameters.get("name")?.first()
var number: String? = parameters.get("number")?.first()
var expirymonth: String? = parameters.get("expirymonth")?.first()
var expiryyear: String? = parameters.get("expiryyear")?.first()
var cvv: String? = parameters.get("cvv")?.first()
// fill with defaults if not submitted
if (name == null) name = "MRS A. N. OTHER"
if (number == null) number = makeRandom(0, 9999, "XXXX-XXXX-XXXX-XXXX")
if (expirymonth == null) expirymonth = makeRandom(1, 12, "XX")
if (expiryyear == null) expiryyear = makeRandom(19, 26, "XX")
if (cvv == null) cvv = makeRandom(0, 999, "XXX")
// write values to database
writeToDB(name, number, "${expirymonth}/${expiryyear}", cvv)
// write only log reference back to response
val log_targets = String.format("Kotlin function executed successfully. log_group = %s, log_stream = %s", context.getLogGroupName(), context.getLogStreamName())
return ApiGatewayResponse.build {
statusCode = 200
objectBody = MsgResponse(log_targets)
headers = mapOf("X-Powered-By" to "AWS Lambda")
}
}
^ This also creates random defaults if for any reason the form fields arenāt set. This will be useful later when we come to load test it.
Now we just need some terraform to push the Lambda function:
resource "aws_lambda_function" "kardapi_func" {
description = "Kotlin CardAPI HTTP request handler"
function_name = "kardapi"
filename = "${path.root}/../kotlin/build/libs/paypoc-kotlin-0.1-all.jar"
source_code_hash = "${base64sha256(file("${path.root}/../kotlin/build/libs/paypoc-kotlin-0.1-all.jar"))}"
handler = "com.skybettingandgaming.demos.paypoc.kotlin.Handler::handleRequest"
role = "${aws_iam_role.lambda_exec.arn}"
runtime = "java8"
timeout = 30
memory_size = 256
}
an API gateway to field and forward requests to it:
resource "aws_api_gateway_rest_api" "kardapi" {
name = "Card API in Kotlin"
description = "Terraform with AWS Lambda"
}
resource "aws_api_gateway_resource" "kardapi_resource_proxy" {
rest_api_id = "${aws_api_gateway_rest_api.kardapi.id}"
parent_id = "${aws_api_gateway_rest_api.kardapi.root_resource_id}"
path_part = "{proxy+}"
}
resource "aws_api_gateway_method" "kardapi_method_proxy" {
rest_api_id = "${aws_api_gateway_rest_api.kardapi.id}"
resource_id = "${aws_api_gateway_resource.kardapi_resource_proxy.id}"
http_method = "ANY"
authorization = "NONE"
}
resource "aws_api_gateway_integration" "kardapi_integ" {
rest_api_id = "${aws_api_gateway_rest_api.kardapi.id}"
resource_id = "${aws_api_gateway_method.kardapi_method_proxy.resource_id}"
http_method = "${aws_api_gateway_method.kardapi_method_proxy.http_method}"
integration_http_method = "POST"
type = "AWS_PROXY"
uri = "${aws_lambda_function.kardapi_func.invoke_arn}"
}
resource "aws_api_gateway_deployment" "kardapi_deploy_alpha" {
depends_on = [
"aws_api_gateway_integration.kardapi_integ",
]
rest_api_id = "${aws_api_gateway_rest_api.kardapi.id}"
stage_name = "alpha"
}
resource "aws_lambda_permission" "kardapi_gw_perm" {
statement_id = "AllowAPIGatewayInvoke"
action = "lambda:InvokeFunction"
function_name = "${aws_lambda_function.kardapi_func.arn}"
principal = "apigateway.amazonaws.com"
# The /*/* portion grants access from any method on any resource
# within the API Gateway "REST API".
source_arn = "${aws_api_gateway_deployment.kardapi_deploy_alpha.execution_arn}/*/*"
}
and some permissions to give the gateway access to it:
# create iam role to empower the function(s) to do stuff
resource "aws_iam_role" "lambda_exec" {
name = "cardapi"
assume_role_policy = "${data.aws_iam_policy_document.lambda_exec-role-policy.json}"
}
data "aws_iam_policy_document" "lambda_exec-role-policy" {
statement {
actions = [ "sts:AssumeRole" ]
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
effect = "Allow"
sid = ""
}
# can't give Lambda log access here, so use attachment (below)
# statement {
# actions = [
# "logs:CreateLogGroup",
# "logs:CreateLogStream",
}
resource "aws_iam_policy_attachment" "lambda_exec-role-policy-attachment" {
name = "policy_atchmt"
roles = ["${aws_iam_role.lambda_exec.name}"]
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaDynamoDBExecutionRole"
}
We can also use Terraform to set up the database table (in DynamoDB):
# create DynamoDB table to store cards in
resource "aws_dynamodb_table" "kard_table" {
name = "Kards"
read_capacity = 2
write_capacity = 200
hash_key = "UserId"
attribute {
name = "UserId"
type = "S"
}
}
resource "aws_iam_role_policy" "lambda_dynamo_kard_access" {
name = "DynamoDB-access"
role = "${aws_iam_role.lambda_exec.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"dynamodb:DeleteItem",
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem"
],
"Resource": "${aws_dynamodb_table.kard_table.arn}"
}
]
}
EOF
}
and logging (in CloudWatch):
resource "aws_cloudwatch_log_group" "kardapi" {
name = "/aws/lambda/kardapi"
}
As this isnāt production code, Iāll also dump some identifiers into the output so we can readily find the log entries in CloudWatch:
// write only log reference back to response
val log_targets = String.format("Kotlin function executed successfully. log_group = %s, log_stream = %s", context.getLogGroupName(), context.getLogStreamName())
Static HTML website in S3
We need somewhere to host the <form> created earlier on. S3 is a good candidate, inexpensive and web accessible. We can use Terraform to substitute in the form target and send the static HTML to our new public bucket:
# create bucket
resource "aws_s3_bucket" "site_static" {
bucket = "${var.site_bucket_name}"
acl = "public-read"
cors_rule {
allowed_headers = ["*"]
allowed_methods = ["PUT","POST"]
allowed_origins = ["*"]
expose_headers = ["ETag"]
max_age_seconds = 3000
}
policy = <<EOF
{
"Id": "bucket_policy_site_static",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "bucket_policy_site_static_main",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::${var.site_bucket_name}/*",
"Principal": "*"
}
]
}
EOF
website {
index_document = "index.html"
error_document = "error.html"
}
tags {
}
force_destroy = true
}
# substitute post_target into index.html
data "template_file" "template_index" {
template = "${file("${path.root}/../static/index.html")}"
vars {
post_target = "${var.post_target}"
}
}
# upload files, only when bucket exists
resource "aws_s3_bucket_object" "index" {
bucket = "${var.site_bucket_name}"
key = "index.html"
content = "${data.template_file.template_index.rendered}"
# source = "${path.root}/../static/index.html"
content_type = "text/html"
# etag = "${md5(file("${path.root}/../static/index.html"))}"
depends_on = ["aws_s3_bucket.site_static"]
}
Submitting the form
After a simple terragrunt apply
(prefer terragrunt as a wrapper to terraform) to push it all to AWS, we can submit the form and get the response back:
{
message: "Kotlin function executed successfully. log_group = /aws/lambda/kardapi, log_stream = 2018/05/10/[$LATEST]5f7356899dcd4997bd45a0bee9c08e78"
}
Viewing the logs and the DB
Our form post created an entry in the DynamoDB table:
and a stack of log data, accessible in near-real-time:
Performance
I broke out Apache Benchmark to do some crude load testing. First of all just a few dozen requests:
ab -n 100 -c 20 https://ys3dx2tl46.execute-api.eu-west-2.amazonaws.com/alpha/something
- 75% of the requests done within 1s, but the longest takes about 30s
Connection Times (ms)
min mean[+/-sd] median max
Connect: 79 122 55.0 96 247
Processing: 269 2658 6462.9 643 24108
Waiting: 269 2658 6463.0 643 24108
Total: 365 2780 6496.8 737 24335
Percentage of the requests served within a certain time (ms)
50% 737
66% 812
75% 912
80% 1039
90% 2301
95% 23528
98% 23855
99% 24335
100% 24335 (longest request)
What this shows I think is that the Kotlin Lambda function is quick to respond when Amazon has spun up a container for it (AWS Lambda uses containers under the hood), but slow when either there isnāt a container, or the existing containers are over-capacity and new ones need to be instantiated. Letās validate that at a larger scale:
ab -n 10000 -c 200 https://ys3dx2tl46.execute-api.eu-west-2.amazonaws.com/alpha/something
- 75% of the requests done within 3s, but the longest takes 30s
Connect: 76 714 433.6 764 2554
Processing: 193 3030 6561.6 994 29436
Waiting: 192 3010 6567.2 971 29435
Total: 283 3743 6437.8 1888 30033
Percentage of the requests served within a certain time (ms)
50% 1888
66% 2156
75% 2311
80% 2410
90% 6006
95% 23996
98% 29128
99% 29205
100% 30033
Itās important to mention that Iāve done zero optimisation on this, so itās a crude illustration of the Lambda starting point, not itās ultimate performance.
Future work
As with all good PoCs, thereās lots more to look at:
- I started on a Lambda-based (Node.js) load testing function, based on Amazonās reference example.
- There exists a DynamoDB āclick-to-encryptā type option, but at the time of writing itās not in Terraform yet. Iām confident it will be shortly, so Iād like to turn that on to more closely mimic a live environment.
- Alternatively, Iād like to create a KMS key pair and use it to do in-app encryption of the data, but thatād be more expensive because you pay for Lambda execution time.
- Iām in the habit of writing AWSpec tests for provisioned infrastructure, but havenāt worked out the best way to do that for Lambda functions yet. In fact Lambda test-driven development (TDD) is a whole area Iād like to investigate further.
Open source
All the code in this post is available in the GitHub repo. Thereās also a getting started guide to help you set up your AWS Environment and local/remote provisioning machine to bulid the PoC using terraform. Enjoy!