TL;DR: In this article, we’re presenting our implementation of serverless computing over kubernetes. You will learn a new way to implement serverless with a lean solution, without the need of using any external dependencies. Upon reading this, you will be able to create your own serverless architecture inside kuberenetes cluster. If your’e looking for a solution for your on-prem environment, stop using remote serverless providers, or just don’t want to pay them for maintainance, this is the place for you.
Implementing Serverless
Serverless computing has become a common term among software development companies. It is a valid architectural solution for teams trying to use cloud services instead of setting up their own servers. Most major cloud platforms have their own solution for this type of development. Azure has Azure Functions, AWS offers AWS Lambda and so on.
What if we want to use the logic behind serverless computing, and not having to depend directly on these types of cloud platforms?
A viable solution for this is using the serverless idea, while using kubernetes as the platform to do so.
In this article, I will take you through the thought process behind the decision to use serverless over kubernetes, and of course the implementation. Whether you are a developer looking for a solution for your project, or just an enthusiastic reader looking to learn something new, I believe there’s something here for everyone.
What is serverless?
Most developers, both seniors and beginners with some experience, probably had to face the agony of starting up a server just to test their work. These servers need to be maintained, which in most cases can take a lot of time. If you disagree, then you probably are not the person responsible to do this in your company, and trust me, someone’s hair is getting grayer and grayer with every server failure.
Serverless architecture is solving this exact issue. It takes the problem of server maintainance out of the dev team hands, and strips it down to the main functionality needed to execute a specific action. Meaning, I don’t have to wait for a server to “wake up” to execute my code. All I need to do is to send a message using a predefined api (http request in my case) and the server will execute my request. No dependencies, no configurions, no wait time. Easy.
So why rolling our own?
Although severless computing over the cloud is very promising, it does have some limitations. In valid network, we develop a security platform for companies that are using blockchain technology. We allow our clients to analyse their code for potential vulnerabilities. Since we want this to be repeatable, we used AWS’ serverless technology called lambda.
While using lambda we started having some issues scanning some customers’ projects. A quick research uncovered that lambda’s tmp folder had a limit of 500MB. Plus, AWS api gateway had a timeout of 29 seconds. So if our process takes longer than that, tough luck.
This was about the time we started thinking of becoming available on premise as well. We had some clients who were’nt willing to publish their git in our SaaS platform for security reasons, so we had to come up with a solution for them as well.
All of the above led us to start looking into implementing serverless on our own, to be able to maintain the advantages of serverless logic and accommodate them to our needs.
Assumptions
Important thing to note is that this is not a begginer’s guide. The technologies we used here are simply a result of our prior knowledge. The specific tech stack used here is not a constraint. If you’re developing in a different language and/or database, you should be able to implement the same logic.
It’s recommended to familiarize yourself with the following tech stack before jumping into the next section to get a better understanding of the implementation
Tech Stack
Kubernetes — used as the orchestrator
RabbitMQ — queue service to manage jobs
Nodejs — javascript framework for backend development
Typescript — optional. type safety for nodejs
MongoDB — the database used to deliver data
General Terms
Http requests — basic understanding of how http requests work
Object oriented programming
The Solution
To implement serverless successfully, we used two different kubernetes deployments running nodejs, RabbitMQ as a queue service, and a MongoDB database instance.
This can be achieved using most of available queue services and NO-SQL databases.
The kubernetes deployments are divided into two. The request pod and the execution pod.
Request pod
Responsible for sending the requests. Should be stable, and can have many instances. Can use persistent volume.
Connects to the queue service and adds requests to the queue. Other services that wish to run serverless requests, send http requests directly to the request service.
The pod has cache on every job it creates, to monitor the responses using a separate channel in the queue where it functions as the recipient. As soon as it receives notice that a job the service is responsible for is done, it accesses the database and requests data response on the specific job.
When data is successfully received, the pod then sends a response back to the service that created the http request with the requested data.
Execution pod
Responsible for executing requests. Functions as a one-time pod. Dies as soon as it finishes processing the request, and cannot be connected to a persistent volume. Also, the image used to create the pod should be as slim as possible.
The execution pod registers to the queue and waits for a job request. Once it pulls a job with all its data, it executes it. As soon as it finishes, the result of the job is processed on two different channels:
A new entity is added to the database with the job id and the data generated from completing the request (if any).
A new channel is opened to send information back to the request pod through the queue service alerting them the job is done, along with completion status (done, timeout or fail).
When the execution pod has done its purpose, it dies. The execution service is responsible for creating a new execution pod, that immediately registers to the queue and waits for new jobs.
High-level design of the implementation
Deep Dive
Lets go over piece by piece about what needs to be done to successfully implement serverless computing.
RabbitMQ
We’re using rabbitmq service here since it is very easy to setup. All you have to do is create a service and deployment for rabbitmq.
Notice that amount of replicas should be just one. Since rabbitmq functions as a “proxy” of sort to forward messages, there shouldn’t be any issue with overloading it
Request Entity
We’re adding a deployment to our kubernetes cluster to function as the request endpoint. The purpose of this endpoint is to receive http requests from other services, create jobs out of these requests and send them to rabbitmq.
Using express library, we receive http requests and add jobs to rabbitmq to execute these requests. We then wait for a response, and once we receive a finished job response with the same job id, we access our database (mongoDB) to extract data from it.
We only forward job id using rabbitmq since forwarding messages over the size of 128MB using rabbitmq is not recommended and in our use-case we sent over 1GB of data
Global variables
private readonly writeQueue: string = 'assignments';
private readonly readQueue: string;
private readonly exchange: string = 'responses';
// The channel in charge of sending job requests
private writeChannel: any;
// The channel in charge of receiving completed jobs
private readChannel: any;
// A hash table allowing us to track the jobs requested by this
// pod, supporting scaling with multiple replicas
// The Observable class is a helper class allowing to track
// timed-out jobs
private runningJobs: {
[jobId: string]: {
params: any,
finished: Observable;
}
} = {};
// A mongodb collection contains info about requested data
private mongoCollection: any
Initialization method
let username = process.env.rabbitmqUsername;
let password = process.env.rabbitmqPassword;
let connectionString = `amqp://${username}:${password}@rabbitmq:5672`;
let connection = await require('amqplib').connect(connectionString);
this.writeChannel = await connection.createChannel();
await this.writeChannel.assertQueue(this.writeQueue);
this.readChannel = await connection.createChannel();
await this.readChannel.assertExchange(this.exchange, 'topic');
await this.readChannel.assertQueue(this.readQueue, {
exclusive: true,
autoDelete: true,
durable: false
});
// We bind the read queue with handleResponse method so that every // response message will go directly there
await this.readChannel.bindQueue(this.readQueue, this.exchange, 'rabbitjob.responses');
this.readChannel.consume(this.readQueue, async (msg: any) => {
let queueResponse: QueueResponse = JSON.parse(msg.content.toString());
await this.handleResponse(queueResponse);
this.readChannel.ack(msg);
});
sendJob(functionName: string, params: any) method
// Generate unique random jobId
let jobId = IdGenerator.generate();
let queueRequest: QueueRequest = {
jobId,
params,
functionName
};
this.runningJobs[jobId] = {
params,
finished: new Observable(false)
};
console.log(`Sending job request: ${JSON.stringify(queueRequest)}`);
// Send to queue as buffer object
await this.writeChannel.sendToQueue(
this.writeQueue,
Buffer.from(JSON.stringify(queueRequest))
);
return jobId;
let jobId = await this.sendJob(functionName, params);
// Set 2 minutes timeout
let timeout: number = 1000 * 60 * 2;
try {
// The Observable helper class throws an error when timout is
// reached and there was no value change
await this.runningJobs[jobId].finished.waitForValueToChange(
true, timeout);
}
catch (e) {
this.log.error(e);
throw new Error('timeout');
}
let data = (await this.collections.queueCollection.get(jobId)).data;
return data;
handleResponse method
let jobId = response.jobId;
// Make sure the job was generated using this specific pod
if (this.runningJobs[jobId]) {
if (response.status === 'error') {
this.runningJobs[jobId].finished.setValue(true);
// Throw error in case an error is returned
throw new Error(JSON.stringify(response.error));
}
this.runningJobs[jobId].finished.setValue(true);
}
Execution Entity
Now we need to add another deployment to our kubernetes cluster to function as the execution endpoint. The purpose of this endpoint is to read jobs directly from rabbitmq, execute them, add results to mongodb and add the result data to a seperate response queue via rabbitmq.
private readonly writeQueue: string = 'responses';
private readonly readQueue: string = 'assignments';
private readonly exchange: string = 'responses';
// The channel in charge of sending responses to requested jobs
private writeChannel: any;
// The channel in charge of receiving job requests
private readChannel: any;
// A mongodb collection to be populated with job result data
private collections: any;
Initialization method
let username = process.env.rabbitmqUsername;
let password = process.env.rabbitmqPassword;
let connectionString = `amqp://${username}:${password}@rabbitmq:5672`;
let connection = await require('amqplib').connect(connectionString);
this.writeChannel = await connection.createChannel();
await this.writeChannel.assertExchange(this.exchange, 'topic');
await this.writeChannel.assertQueue(this.writeQueue);
this.readChannel = await connection.createChannel();
await this.readChannel.assertQueue(this.readQueue);
// Activate the handleRequest method when a new job is received
this.readChannel.consume(this.readQueue, async (msg: any) => {
try {
console.log('Received message from rabbitmq');
let queueResponse = JSON.parse(msg.content.toString());
await this.handleRequest(queueResponse);
}
finally {
// Notify rabbitmq the job is done
await this.readChannel.ack(msg);
// End the process so that kubernetes will create a new one
process.exit(0);
}
});
We’re ending the process to allow a brand new one to be created without any “memory” of previous results (hence we’re not using static volumes)
This is why it’s hightly recommended to have several instances of the execution pod to allow other pods to manage workload while the new one is booting
handleRequest(request: any) method
let jobId = request.jobId;
console.log(`Analyzing job ${jobId}. Function name: ${request.functionName}`);
let response: any;
try {
switch (request.functionName) {
// here you will add your own list of function names to run and
// do whatever you wish
case 'foo': {
let data = foo();
// add the result to the database
await this.collections.queueCollection.add({ createdAt: new Date(), jobId, data });
}
}
// this is the response sent to the queue
// we're notifying the request pod the job is done
response = { jobId, status: 'finished' };
}
catch (e) {
response = { jobId, status: 'error', error: e.message };
}
console.log(`publishing result: ${JSON.stringify(response)}`);
// publish the result to the response queue
await this.writeChannel.publish(this.exchange, 'rabbitjob.responses', Buffer.from(JSON.stringify(response)));
Summary
In this article, we’ve created our own serverless solution over kubernetes. Although many solutions out there solve the same problem, all of them have dependencies over other services that essentially make the deployment process dependant on external services. The solution presented here is lean, and can be implemented with little-to-none external dependencies.
About Valid Network
Valid Network’s blockchain security platform provides complete life cycle security for enterprise blockchains from initial development to active deployment and management. Based in Be'er Sheva, Israel, the company’s solutions enable enterprises to innovate with blockchain faster, providing complete visibility and control over their distributed applications and smart contract governance, compliance, and security posture through advanced platform capabilities.
Uncover risks & opportunities in crypto to maximize your gains.
Valid Data’s real-time and predictive insights are used by Cryptocurrency traders and exchanges, as well as investors and hedge funds, to make better investment and trading decisions, to protect the value of their digital assets, and to capitalize on market opportunities that only Valid Network’s technology can uncover.