Microservices Architecture in Artificial Intelligence

8 min readJun 26, 2019

You probably already know some of the applications you use in day to day life use Artificial Intelligence, such as Alexa, Google Home, Spotify, Siri etc. These AI’s are set of programs are developed to perform specific tasks, that is being utilized wide range of activities.

As you see, the Intelligence in AI is composed of 1. Reasoning 2. Learning 3. Problem Solving 4. Perception 5. Linguistic Intelligence. When you define these set of characteristics , the Architecture of AI must follow the pattern of separation of concern, if not then it became an single all in one monolith kind of applications. If the application builds in monolith approach for system designing and implementation and it follows the same approach for the provisioning of AI services.

In monolithic approach, the services are developed and updated separately and recurrently for different services, which increases the overall cost of development and deployment. This cost can be minimized by distributing the functionality of an AI application into a group of smaller services. The better architecture for AI must be based on microservice architecture style.

In the era of digital transformation, the Artificial Intelligence is emerging with improved data collection methods, data training sets, advanced data processing mechanism, enhanced analytic techniques and modern service platform, if you separate the implementation with similar grouping, then it can develop, destroy easily without much risk.

The Aim of the article is the provision to use of Microservices in AI systems like AI in marketing, AI in Banking, AI in Finance to predict future data, AI in Agriculture such as climate change, population growth , AI in Healthcare like medical care system, clinical decision support system, AI in gaming and AI in Autonomous vehicle & AI in social media etc.

For all these kinds of AIs, the Microservices can be used for task such as data processing, data aggregation, data transformation by using pipeline with event driven in it. Comparison of various tasks shows that building a microservices for these tasks better than single monolithic application.

AI Sub Categories

Broadly, the Artificial Intelligence (AI) field of following sub categories, these sub categories have its own Architecture components.

The horizontally layered platforms do not address the specific issues of specific AI domains, but they support necessary technical solutions to the across the platform, however, the vertical components are used to resolve domain specific problems. The vertical components are mentioned below.

Usually, the Architecture of AI component is based on modular and layered. The 1st layer is to read the data like data processing, data aggregation a data transformation etc., the 2nd layer is to apply the execution layer like ML Models and Execution Layer for Machine Learning, Foundation Model and Cognitive Services for Intelligent Agent, Automation Engine and Recognition Engine for Robotics etc., 3rd layer is exposing the API’s like Text/Speech API, Image API, Predictive API, Ecosystem API etc.

The vertical components can be developed by using the Microservice principles and architecture which improves the service modularity, extensibility, availability, scalability, Resilience etc. for AI services.

As AI will be de facto standard and usage pattern for number of objects in Industry and there is no look back, it will grow spontaneously by year on year, this will increase the number of objects or use cases that creates many challenges and opportunity for enterprises and software engineers. In the broader concept and vision of AI applications, every connected object will be reused multiple AI application domains for enhancing the smartness and Intelligence of the AI applications.

Microservice Vertical Components — Speech AI

Let’s detail out the vertical component i.e., microservices, in the above list of AI usage, in this article, I considered Speech AI implementation. Similar approach to be followed for other AI implementation and also, I will write series of article related to other AI implementations.

Speech recognition is an area of Natural Language Processing and Artificial Intelligence. To achieve good accuracy and efficiency of automatic Speech recognition system of various languages, the challenges are morphology, language barriers, different dialects etc.

You already know some of the below speech recognition engines,

Below are few of AI based Speech implementation:

1. Google’s AI Speech Recognition to Human Captioning for Television News

2. AI that can understand meaning of baby’s different cries

3. Computer can tell if baby is hungry, tired, or in pain by listening to their cries

4. Transcription performance milestone on automatic broadcast news captioning

5. Alexa for Business Review

6. Google updates Maps, Search and Assistant so you can order food without app

7. AI Voice Assistants Reinforce Gender biases

8. Google AI can help you speak another language in your own voice

9. DIY kid smart speaker that features private voice assistant

10. Google AI Translatron can make anyone a real-time polyglot

If you search google, you can find 100’s of different AI based speech applications and implementations, do you think all these applications architecture is reusable, resilient, easily deployable, easy to discard etc. if not? How to create new AI enabled speech software to enable all these characteristics.

The Speech recognition program that work using algorithm through 1. Acoustic and 2. Linguistic modeling, the below modeling diagram detailed various components in speech recognition AI

In below diagram, each blue box has its own responsibility and can be developed with polylith and polyglot architecture and each box can scale and deploy independently, hypothetically each box are candidate for Microservices.

For Ex: Let’s consider an example of any Indo-Aryan which evolved from Devanagari script and each language are spoken more than million peoples. Each language in India is rich morphology and has complex structure of syntax. Diversity of dialects in the form of pronunciation, grammar, and vocabulary make speech recognition more complex. Converting speech into text mechanism for Indian languages includes major limitations regarding accuracy because of bulk corpora set etc.

The basic architecture of the speech processing, recognition, modeling requires well thought and reusable design model to achieve to process many languages, accuracy and encapsulation of data.

The general architecture of Speech recognition and write to text consists several functionalities and each functional box as mentioned below can be developed as a Microservices.

Speech Pre-Processing — Through the recording tool input speech data stored in .wav format, the design of microservices to

A. External interface to read .wav file, AIFF, AU or PCM files (either batch or near real time with event driven concept)

B. Check the quality, frame size etc.

C. Stores the data into the database and it handles by all kinds of interfaces either batch, event driven, interfaces etc.

Intelligent Agent — This microservice provide a cognitive capability and recognition capability that communicate in natural language with humans and provide advisory using Machine learning and NLP like intelligent and provides set of services that enables services like NLP, speech to text, text to speech makes an agent to see, hear, speak, covert and understand and interpret natural method of communication. There are two components in the NLP

A. Natural Language Understanding — Mapping the given input in natural language into useful representations and Analyzing he different aspect of the language

B. Natural Language Generation — Process of producing meaningful phrases and sentences in the form of natural languages like Text planning, sentence planning etc.

Acoustic Analysis — Feature extraction is used to extract the related information like pitch, frequency and environment by using the intelligent agent.

Speech Modeling — This microservice generates the models for the speech generation process and provides end output of text for each voice input with the help of various Machine Learning Models.

ML Modeling — This microservices is used to make predictions about data. An algorithm together with the training data generates an ML model

Architecture Diagram

The below diagrams provide high level of view of microservice architecture with domain objects and provides flow with events and objects

Speech Pre-Processing Microservices: This microservice is developed by using the Python and components in the microservices provides read various types of files and segment on raw data speech signals into evenly spaced frames. The frame size can be selected based on rapid transition and enough resolution in frequency. This microservice uses Intelligent Agent Microservices for NPL process.

Intelligent Agent Microservices — As name suggest, this microservice is an intelligent process by using NPL. This microservices uses open source NPL process like Apache open NLP (http://opennlp.apache.org/), the Open NLP supports tokenization, sentence segmentation, tagging, chunking etc. This microservices follow the steps to provide NLP capability to this (a. Lexical Analysis — Involves identifying and analyzing the structure of words. Syntatic Analysis — Grammar and arranging words, Semantic, discourse and Pragmatic Analysis (https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm)

Acoustic Analysis Microservices: This microservices extend the details provided by IA microservices along with pitch, frequency and environment for the statistical analysis. The MFCC Cepstral analysis (http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/) a standard technique used for feature extraction. All input files are converted into the .mfcc files which includes list of cepstrum coefficients.

Speech Modeling Microservices: Initialization of speech recognition system uses HMM model (Hidden Markov Model — represented as simplest dynamic Bayesian network- https://en.wikipedia.org/wiki/Hidden_Markov_model ). The speech signals are quasi-stationary and stable only for short period of time, stability of signals can be viewed in form of states in a HMM topology. HMM prototype tools (https://labrosa.ee.columbia.edu/doc/HTKBook21/node105.html) each HMM for individual words.

ML Model Microservices: The ML model microservice is extension of speech modeling process with the algorithms are split across microservices. This microservice provides capability of re-estimating the model of Baum-Welch Algorithm (https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm) is used to update active state with optimal values for HMM parameters. The Pronunciation model is used to develop correspondence between different HMMs to form model for each input. The Decoding concatenating sub words models which is composed into decoding network.

Conclusion

The accuracy and performance are the primary concern of speech recognition software’s, the complexity of the entire system to be minimized by adopting the Microservices with AI and event style patterns.

Microservices Architecture in Artificial Intelligence

Written by Shivakumar Goniwada Rudrappa