Software Architecture Design and Engineering at a Startup

How to mitigate challenges in designing software at an early-stage startup

The great thing about starting a new project is that you get a clean slate. No baggage of design choices that you hated to look at every day in your last project. But how many times have you seen a shiny new project not turning into the same intractable mess?

It is more likely to happen in a fast-paced startup. The faster the pace, the sooner it happens. So how do you balance moving fast without being trapped in analysis paralysis and keep technical debt at a manageable level?

You design for change. Ignore the refrain that prevention is better than cure. Instead of preventing the mess, you should embrace it and mitigate it when it happens. That’s what we have done at Slang Labs

In this article, I discuss:

  • Startup Reality: forces and constraints in a startup.
  • Engineering Philosophy: our philosophy to manage that reality.
  • Slang Architecture: evolution of Slang microservices and SDKs guided by our philosophy.

Startup Reality

In a startup, you may have some idea and intuition about the product you want to build. However, you have to iterate rapidly to achieve Product-Market Fit.

The three axioms of early-stage startups are:

  1. Change is the only constant.
    As you figure out the business, the product requirements and features will change. Tech and cost structure will change.
  2. Mistakes will undoubtedly happen.
    You will be making decisions with incomplete information. Some of those decisions are bound to be wrong.
  3. Time and resources will never be enough.
    You will have to iterate fast. You will be forced to cut corners. Some of it will come back to bite you.

Engineering Philosophy

The three axioms guide our engineering philosophy. Some of it may appear counter-intuitive but has proven effective in our experience.

Do maximal design but minimal implementation.

Drawing on paper is cheaper than writing code. If building something will take a day, it is okay to spend 30 minutes thinking about it. If a task takes a week, you are better off investing half a day in thinking it through.

We design for what we want in the future in all its glory, but we implement only the parts that we need right now. That helps us in keeping an eye on the future without investing resources prematurely.

Make mistakes reversible.

Design exercise helps in mapping the terrain and evaluating alternatives, and identifying unknowns. We seek to limit the blast radius in case of making a wrong choice:

  • Isolation: We carefully design APIs to encapsulate and isolate design choices. So the changes needed to reverse a design choice are localized. In addition, we take extra care in designing data storage models and object models because these are reused across subsystems and microservices.
  • Configuration. Whenever possible, we externalize the design choices as a configuration parameter. That allows us to plug in alternative implementations if needed.

Build and automate safeguards for speed.

Despite our best efforts, we may still have to re-implement significant parts of our system. Thus we invest in unit and integration test suite. Of course, it takes extra time and effort, but this safety net allows us to change code fearlessly and rapidly.

We have an automated CI/CD pipeline for effortless deployments and rollback on a Kubernetes cluster. The pipeline runs the necessary unit and integration tests for every code patch. It first deploys the new version of microservices on a staging tier, and if everything is okay, it promotes the patch and tags it as ready for prod deployment.

Slang Architecture

Slang CONVA platform provides Voice Assistant as a Service (VAaaS) specialized for various domains (e.g., e-commerce). Slang has two main components:

  • Client SDK: An SDK for android, web, and iOS apps that with voice-to-action APIs. These APIs implement context-aware conversational flow for the typical user journeys for the app’s domain.
  • Microservices: Backend services that power the Client SDK APIs. These services manage the meta information of the app, train the speech recognition and natural language understanding models.

This section briefly describes the high-level designs of both. That will set the context to show how our engineering philosophies have helped evolve our code. As requirements changed with better market understanding, these philosophies helped us manage technical debt without sacrificing velocity.

Client SDKs

The Client SDK provides a simple programming model of the User Journeys and the App States:

  • A User Journey is an application’s workflow that a user follows, from initiating conversation to completing the intended action.
  • An App State is a point in a User Journey that captures the app’s context at that point.

The Client SDK has a state machine for voice-to-action flow and an event bus for communication between the state machine and various sub-systems. Each subsystem implements only the part of the state machine that is relevant to it.

The state machine and event bus design decouples all subsystems and eliminates the need for an all-knowing orchestrator that handles all possible permutations of user UI and speech actions, communications with backend services, and the conversation stage.

States in the Client SDK State Machine represents the conversation stage in the life of the application:

  • INIT: SDK initialization and handshake with the server
  • READY: SDK is ready to start a user journey
  • USER_READY: SDK is ready to listen to the user
  • TEXT_READY: Speech Recognition is completed, and the transcription text is ready
  • INTENT_RECIEVED: The user’s utterance has been processed, and intent is understood.
  • INTENT_COMPLETE: Any missing information required for the intent for the user journey has been prompted from the user and filled from the application’s context.
  • ACTION_COMPLETE: Action has been taken to fulfill the user’s intent.

Client SDK State Machine and all subsystems:

  • Consume an event from the event bus,
  • Perform the operations triggered by that event,
  • Transition to the next state, and
  • Emit an event to the event bus if necessary

Each subsystem handles only one aspect, such as:

  • Initialization
  • Automatic Speech Recognition
  • Natural Language Understanding
  • Text-to-Speech
  • User Interface
  • Analytics
  • Android/WebOS/iOS Platform

The state machine and subsystem interact with backend microservices through a utility layer.

Slang Client SDK Design
Slang Client SDK Design

Microservices

While the Client SDK is responsible for easy-to-use programming APIs with a small footprint, it delegates tasks like automatic speech recognition (ASR), natural language understanding (NLU), Text-to-Speech (TTS), and Machine Translation (MT) to backend microservices.

A conversational voice assistant for a customer application can be created and configured using the Slang Console. Here is the life cycle of a voice assistant and the corresponding microservices:

Here is a summary of backend microservices:

  • Console: When a customer signs in and creates or configures a voice assistant, the console microservice orchestrates saving voice assistant schemas and training the necessary models.
  • Schema: The Schema service is responsible for managing voice assistant specifications (user journeys, intent-and-entities in those journeys, prompts and statements in various conversation flows, etc.). We provide canned ready-to-use specifications for different business domains.
  • ASR, NLU, TTS, MT: There are two versions: training and inference. Whenever a customer updates a voice assistant, these microservices retrain the corresponding models. When a user speaks to the app, Client SDK uses inferences services to understand it and take needed actions. These services are designed to plug in multiple engines of ASR, NLU, TTS, and MT. 
  • There are microservices for auth, accounting, and analytics, etc.

The microservices are written in Python, JavaScript, and Go lang. The production deployment is on the Google Cloud Platform (GCP). In addition, we utilize GCP’s logging and monitoring services.

Following is a simplified version of our microservices and their interactions.

Analytics

At present, we all are so accustomed to “touch” gestures (e.g., tap, press and hold, swipe). The gestures are almost standard now. It wasn’t so when touch phones were invented. It took experimentation and implicit user training.

Voice is a newer medium of Computer-Human Interaction. Similar interaction standards for voice will evolve too. In the case of Slang VAaaS, it is a multi-modal experience. A user may interact with the voice as well as touch.

Understanding user behavior is critical for our success. Therefore, we have devised In-app Voice Assistant Analytics. We collect a small number of key events from SDK related to voice-specific user interactions, such as:

  • the start of a user journey with tapping on the mic button
  • to and fro conversation between user and app
  • action to complete the user’s journey
  • user canceling it midway

We have a serverless analytics pipeline on Google Cloud:

  • Collection: SDK collects key events and sends them periodically.
  • Ingestion: Events are pushed to PubSub.
  • Processing: A DataFlow job validates and persists them in a BigQuery warehouse.
  • Computation: An SQL pipeline computes and saves aggregations periodically.
  • Presentation: Analytics is accessible on a DataStudio dashboard. BigQuery console is used for running queries for custom analytics and exploration.

Slang Data Analytics Pipeline

Evolution

The design explained above is neither the first design we created, nor it will be the last. It has evolved through several iterations:

  • Current Client SDK design with event bus is the third version.
  • While overall Microservice Architecture is quite similar to how it started, several services have been refactored into their own and even exposed internally. There is now a multi-level cache to improve performance.
  • Analytics also has gone through significant changes. The event collection is now lean and clever. It collects lesser but richer events. The pipeline is now hosted on DataFlow that allows horizontal scaling.

Let’s examine how philosophy has been facilitating this evolution.

Do maximal design but minimal implementation.

We drew a version of microservices architecture at the very beginning when we had only a demo app. So we knew that we would build it only when we need. But drawing it made it easy to discuss, and also always us aware of future needs.

This approach of keeping an eye on the future led us to design a configurable plug-and-play system. Following our minimal implementation rule, some of our microservices were routing to 3rd party services. That allowed us to go to market quickly. Later, when we developed our own more suitable implementations, we swapped out these 3rd party services with zero impact on our customers.

Make mistakes reversible.

In low-level design, we are always mindful of the blast radius in case we have to change it radically. It goes hand-in-hand with “maximal design but minimal implementation.”

While the analytics pipeline was designed to hyper-scale (“maximal design”), we did not have enough traffic initially to justify the cost. We crafted a cheap ingestion and processing implementation using scheduled cloud functions. We knew that we would have to replace it with a different implementation. But the impact was limited to only one part of the pipeline.

Build and automate safeguards for speed.

We could fearlessly redesign Client SDK and almost rewrite it because we had a battery of automated tests as a safety net.

Summary

Any organization trying to ship products fast and learn from customer response will face the same three axioms:

  1. Change is the only constant.
  2. Mistakes will undoubtedly happen.
  3. Time and resources will never be enough.

This article showed how a set of coherent philosophies to handle it had shaped our system’s architecture.