Apollo Documentation

Apollo Engineering Kit

Discover, develop, and deploy conversational, vision, and audio AI applications. The Apollo Engineering Kit is ready to be the all-in-one solution for all of your embedded AI needs

With an abundance of image-based data readily available, Apollo comes ready to help you in taking the next step by augmenting your image data with audio data and subsequent conversational analytics to improve your insights, efficiency, and processes. Whether it’s a tool to automate transcription, minute taking and summarisation for meetings, perform abnormal sound classification and speaker verification to elevate your security system, analyze incoming textual data, or even to generate conversation and music - Apollo has all the necessary hardware and software to rise to the occasion.

Apollo Engineering Kit Image
  • Multifaceted software capabilities: Using Jetpack, Apollo is capable of running NVIDIA’s Deepstream, and newly released RIVA.

  • Perfect for all coding levels: Use ready-to-use demos or create your own custom applications from scratch.

  • Real-time information gathering: Harness the combined software and hardware to obtain multimodal data in real-time.

  • No AI is off-limits: Use Apollo’s hardware to deploy AI models from any framework and optimize them for your cause.

  • Reduced development time: Take advantage of SmartCow's readily available resources to speed up your workflow.

With Ultron, we want to change the development and deployment of automation and autonomous infrastructure with a state-of-the-art compute and sensor fusion platform that goes beyond the capabilities of traditional industry-grade PLCs.

Get Started

All tools and software to get you started are already included in the engineering kit as part of the Apollo SDK. When new examples and applications are added, simply update the SDK through apt to pull the changes to your engineering kit.

Details

Apollo's integrated hardware includes—a base frame that allows the device to stand upright, four microphones, two speaker terminals, an 8MP camera module, a 2.08 inch OLED display, and a 128GB NVMe SSD—all in one small package. Additionally, the board has two programmable buttons for adding custom applications. One of the buttons by default is set to "One-Key Recovery", enabling you to effortlessly update your device. The second button is not configured to let you add your own application.

Apollo Annotated Diagram

**The model shown above is a pre-production model. The final kit's contents and appearance may vary slightly.

Specifications

NVIDIA® JetsonXavier™ NX

CPU

6-core NVIDIA Carmel ARM®v8.2 64-bit CPU 6MB L2 + 4MB L3 processor

GPU

NVIDIA Volta™ architecture with 384 NVIDIA CUDA® cores and 48 Tensor cores

Memory

16 GB 128-bit LPDDR4x @ 59.7GB/s

Storage

16 GB eMMC 5.1

Physical I/O

Display

- 1x Mini DP

Ethernet

- 1x RJ45 GbE (10/100/1000)

Tact Switch

Recovery / Reset / Power / Programmable Buttons x2

USB

- 1x USB3.2 Gen1 Type A- 1x USB2.0 Micro B (OTG only)

Line in

- 1x 3.5mm Phone Jack

Line out

- 1x 3.5mm Phone Jack

Internal Connector

Speaker out

- 2x pin header (2-pin with 2.54pitch)

Mic in

- 4x MEMS Microphone

Camera

- 1x 15-pin FPC for MIPI CSI-II 2 Lanes

2-pin header

- 1x RTC with CR2032 Battery

4-pin header

- 1x Fan

7-pin header

- 1x SPI for OLED

12-pin header

- 1x UART (Debug only)
- 1x Power LED
- 1x Auto-power-on
- 1x Reset
- 1x Recovery
- 1x Power

40-pin header

- 1x UART
- 2x I2C
- 1x SPI

Expansion slots

- 1x M.2 2280 Key M
- 1x M.2 2230 Key E

Environment

Power input

12V DC input with DC Jack

Dimension

PCBA with base frame: 81 x 69 x 125 mm (L x W x H)

Operating Temp

0°C ~ 50°C

Storage Temp

-25°C ~+80°C

Storage Humidity

95% @ 40 °C (non-condensing

Programmable Buttons

Apart from the Reset, Force Recovery and Power on buttons, Apollo also features two user programmable inputs, one of the buttons by default is set to "One-Key Recovery", enabling you to easily reflash or restore Apollo to its default. The second button is not configured to let you add your own application.

One-Key Recovery

Unlike other Jetson development kits which require a host Ubuntu machine to upgrade the device firmware, Apollo features a one-touch recovery mode which performs the upgrade through a Firmware-Over-The-Air (FOTA) mechanism, similar to how your mobile phone updates itself.

At startup, Apollo automatically checks whether a new firmware update is available. If there is, the user is prompted on the OLED display to confirm if they wish to upgrade Apollo to this new release, or to skip the update. If it is confirmed, Apollo automatically downloads and applies the upgrade, and the user is prompted to reboot the Apollo board once completed. Users can also disable this update checking procedure for a period of 30 days or re-enable it at any point of time by running the relevant scripts, as described in the README file located in

/etc/apollo/services/onetouch/README.md

One-Key Recovery offers a simpler way to keep their Apollo kit up to date with the latest firmware. Comparing with other jetson kits:

Apollo firmware update

Packing List

Item

Description

Quantity

Base Frame

- 85 x 69 x 122 mm (L x W x H), T=1.5mm, SGCC, yellow

1

Fan

- Fan of NX module

1

OLED

- 2.08 inch, SPI interface, white

1

Camera

- IMX179 8MP camera module

1

M.2 SSD

- M.2 128G NVMe SSD, -20°C ~+75°C

1

Power Adapter

- DC 40W 12V

1

Power Cord

- EU Power Cord

1

Ordering Information

Part Number

Description

NVDK-AK1V

- Carrier board with 8G NX and Fan, including base frame, 2.08 inch OLED, IMX179, 128G NVMeSSD, Power adapter, EU power cord, 12V DC in, 0°C ~ 50°C

NVDK-AK1W

- Carrier board with 16G NX and Fan, including base frame, 2.08 inch OLED, IMX179, 128G NVMeSSD, Power adapter, EU power cord, 12V DC in, 0°C ~ 50°C

Overview

Apollo is a professional engineering kit that aims to help developers succeed with next-level edge AI applications in vision and conversational AI. SmartCow has prepackaged the NVIDIA DeepStream and RIVA SDKs in a 128GB NVMe SSD drive with various NLP examples including: text-independent speaker recognition, speech-to-text and sentiment analysis, language translations and speaker diarizations, and applications for abnormal sound and surveillance. Furthermore, SmartCow has defined a programmable button as a "One-Key Recovery" button to simplify the process of device recovery, BSP update, and development.

Intro

Apollo is an audio/video AI engineering kit that comes preloaded with NVIDIA's Jetpack, Deepstream, and RIVA SDKs. Running Ubuntu, a complete Linux operating system, Apollo makes it easier to get started with image, conversational, and audio AI on an embedded system.

Conversational AI is rapidly becoming ingrained in our daily lives through applications such as predictive text, text and data analytics, language translation, and search engines. Language, on the other hand, is not only written, but also spoken, and Audio AI goes hand in hand with language, bringing with it tasks such as speech-to-text, text-to-speech, and sound classification, processing and generation.

Combining language and audio has resulted in fully voice controlled, interactable smart assistants such as Siri and Alexa, which can recognize the user's voice and perform a variety of tasks including checking reports, retrieving data, providing social interaction and controlling other smart devices.

Tutorials

Note

All tutorials are based on Python and use the same program structure and common libraries (pyaudio over ALSA).

Note

More examples of NLP and DeepStream will be released on a regular basis.

Hardware Setup

  • Attaching a speaker to Apollo

  • Initializing the onboard microphones and speaker

  • Command line instructions for using microphones and speaker

While you can use Apollo without any other peripherals, if you want to output sound, you can connect an external speaker to the engineering kit. The inbuilt amplifier can accommodate a variety of speakers. However, we recommend using a 1 Watt, 4 Ohm speaker like the one shown below.

Apollo speaker

To attach the speaker, simply insert its connector into Apollo as shown below.

Apollo speaker wire placement

Before you proceed, ensure that the speaker's red wire is facing the near edge of Apollo. Reversing the polarity of the speaker may cause damage to the speaker. To test the speaker, use the following commands to play one of the included .wav files and output it to the speaker. Alternatively, you can record your own voice using the inbuilt microphones and play it back through the speaker.

Getting Started with Apollo RIVA Installation

Follow these steps to install NVIDIA RIVA

RIVA takes approximately 3.5GB to download and install without any active services runnin

  1. Create a new NVIDIA account or sign in to an existing account. https://ngc.nvidia.com/signin

  2. Generate an NGC API key.

    • Sign in to your NVIDIA account.

    • On the top-right corner of the page, click display_name>Setup.

      display_name>Setup.
    • Click Get API Key, and carefully read the on-screen instructions.

    • Click Generate API Key.

    • Click Confirm.

    • After the API key is successfully generated, the key is displayed on the API Key page.
      Important: This is the only time your API Key is displayed. Keep your API Key secret. Do not share it or store it in a place where others can see or copy it. If you lose the API key, you can generate it again; however, the old API key becomes invalid.

      api key
  3. Download and install the NVIDIA GPU Cloud (NGC) CLI based on your operating system. https://ngc.nvidia.com/setup/installers/cli

    To download and install NGC CLI on Apollo, follow the instructions on the AMD64 Linux tab.

    CLI install

    At the Linux command line, run the following commands.

    • Download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command.

      wget -O ngccli_arm64.zip
      https://ngc.nvidia.com/downloads/ngccli_arm64.zip
      && unzip -o ngccli_arm64.zip
      && chmod u+x ngc

    • Check the binary's md5 hash to ensure the file was not corrupted during download.

      md5sum -c ngc.md5

    • Add your current directory to path.

      echo "export PATH=\"\$PATH:$(pwd)\"" >>
      ~/.bash_profile && source ~/.bash_profile

    • Type the following command, including your API key when prompted.

      ngc config set

    • After providing the API key, the system prompts you to specify the details listed in the following table.

      Prompt

      Type

      Enter CLI output format type

      ascii

      Enter org

      Type the string listed under Choices.

      Enter team

      no-team

      Enter ace

      no-ace

  4. To download RIVA for an ARM64-based systems like Apollo, run the following commands at the Linux command line.

    ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.0.0
    cd riva_quickstart_arm64_v2.0.0
    bash riva_init.sh
    bash riva_start.sh

  5. At the Docker container CLI, perform the following configuration steps.

    jupyter notebook --generate-config
    jupyter notebook password

    Note: You can specify any password you want.

  6. Launch the jupyter notebook session for asr-python-basics, asr-python-boosting, and tts-python-basics by running the following command.

    jupyter notebook --allow-root --notebook-dir=/work/notebooks

    You are prompted to launch your web browser using a link in the terminal and also prompted to type the password you previously used.

    You should now have access to the notebooks.

Getting Started with Apollo: Audio

Working with Apollo can simplify and streamline your processes. However, you must make sure that Apollo is correctly configured. Ensure that you initialize Apollo's microphones and speakers. Apollo uses the I2S interface to communicate with these devices. The I2S connected to the microphone is always active, however you must initialize the I2S connected to the speaker at least once during the lifetime of the device.

sudo /opt/nvidia/jetson-io/jetson-io.py

If you have enabled the I2S function correctly the device prompts you to reboot. For more information, refer to the following article that walks you through the process step by step:

A Jetson Device For All Your Audio Applications

After rebooting, configure the sound card to access the microphone and speaker on the appropriate circuitry, and you're good to go! To complete these tasks, visit our Apollo-Getting-Started repository.

The next step is to use the PyAudio library, which enables you to control and manipulate ingoing and outgoing audio from a Python environment.

You can use PyAudio for the microphone and the speaker. For the microphone, in particular, PyAudio is capable of both recording .wav files and streaming audio, making it a versatile and useful tool. PyAudio can be easily configured to generate mono or stereo files with varying bit depths and bit rates.

Typically, audio AI models are designed to receive 16 bit data. However, the Apollo microphones return 24 useful bits of data. To address this, Apollo software includes a relevant operation that converts microphone data to its 16 bit equivalent while streaming. This enables AI audio models to run efficiently in real time. Later on, there are examples of real-time audio processing for streaming applications, such as changing the volume during the stream.

Feel free to record yourself speaking using either the command line (as discussed earlier), or using the PyAudio scripts. If you speak English, save the file as you'll be able to use it later for speech-to-text testing.

Check out the Apollo-Audio repository to get started with recording and playing .wav files with PyAudio, and other applications such as a volume meter and a small audio library to delve into some minor processing tasks.

Getting Started with NLP: NLP Tasks

Natural Language Processing is geared towards allowing machines to interpret and respond to textual data. Natural Language processing combines computational linguistics, statistics, and machine and deep learning together. Common applications include text classification, language translation, and text analytics such as named entity recognition and sentiment analysis.

Named Entity Recognition (NEM) locates words or phrases that are important entities - such as labeling people and places. Sentiment analysis attempts to quantify subjective feelings, like whether a piece of text is positive, negative, or neutral.

NLP processes are rapidly becoming commonplace with various businesses using several tasks such as question answering, text summarization, and speaker diarization to streamline their workflows and gain insight to what people are saying about their products and services, especially on social media.

For effective machine learning, preventing overfitting is imperative, and for this reason a good dataset would be one with about 10x as much data as there are dimensions. Unfortunately language, whether written or spoken, is typically messy: punctuation, excessive whitespaces, repeating words, words with little to no contribution to the meaning of the sentence, and various other factors all serve to make our machine learning tasks more difficult. This makes language-based datasets extracted from pieces of text highly dimensional, and hence more likely to overfit. For this reason, text preprocessing is imperative, and these relatively simple tasks make the difference between good models and excellent models.

Text preprocessing is simply the process of cleaning the text, and only leaving tokens of high value and importance to a task. Capital letters, punctuation, numbers, whitespaces, and certain commonly occurring words (such as ‘the’, ‘a’, and ‘and’) help make a sentence grammatically correct, but are not typically of significant enough value to a machine learning model, and hence they should be removed.

When creating a language-based dataset, the dimensions (columns) of the dataset are typically the words themselves. When plotting a frequency distribution of the words in a text, we would typically note that there are many words which are seldom mentioned, and hence result in very sparse dimensions, most likely leading to overfitting. Not only is it good practice to remove excessively common words, but it is also recommendable to remove words that are mentioned very sparingly.

Another method to combat language sparsity is to eliminate word variations while retaining their root/base forms. An example of this would be reducing the words "running," "runs", and "runner" all to "run". This reduces the number of dimensions while also making the remaining dimension less sparse.

With Apollo, we have included a Apollo-NLP library to get you started with some of these functions as prerequisite techniques to train your own language models, followed by a small demo for your enjoyment.

NLP Use Case: Chatbots

Chatbots are NLP-based software applications that can be used to converse with humans or other chatbots. Chatbots, which are typically used to interact with customers and answer frequently asked questions on websites, are expanding in scope and are expected to rise in popularity. Chatbot complexity varies greatly, with simple bots scanning for key phrases and more complex bots utilizing cutting-edge NLP pipelines.

There are various forms of chatbots, with many designed for business applications and others purely for social conversation. A common type of chatbot is the intent based chatbot. This bot, while conversing with a person, attempts to map what the person is saying to a list of predefined intentions. Common intentions are "greeting", "weather", and "directions". When the intent is determined to be "weather," the chatbot downloads the most recent weather reports before summarizing them using an NLP summarization model and reading the summary out loud using text to speech. Similar tasks are carried out for other intentions.

Training an intent-based chatbot is also relatively straightforward; with only a few example phrases necessary for each intent (more examples never hurt), the underlying model can begin to identify similar sentences by either a bag-of-words model or via text embeddings. This means it can pick up on small variations and reliably and continuously identify the intentions behind user queries.

Other chatbots may not be trained so easily. Social chatbots, for example, typically require extensive data in the form of a query and an appropriate response, and are commonly trained using many movie and television scripts. Chatbots can also be developed using reinforcement learning, but this approach is often not as straightforward as intent style training or even script training, as previously mentioned.

Chatbots can effectively be combined with speech-to-text, text-to-speech, and smart devices to create digital assistants, allowing a user to have effective voice control over various appliances both in their house and in remote locations. Consider asking your custom digital assistant to prepare your favorite coffee by communicating with the bluetooth-controlled coffee machine in the office kitchen, or instructing it to check traffic reports to determine the quickest route to your next destination.

If you are interested in learning more about chatbots, we have put together a small project to get you started. We are developing an Apollo-Chatbot so that we can eventually have our own SmartCow Digital Assistant.

Getting Started with Audio Processing: Audio Recognition

Audio recognition is a growing field that has common applications such as music and speech recognition, animal species identification, and alarm detection, among others.

Identifying an entity or phenomenon by sound computation is an interesting task, which has received much attention in recent years. Previously, rule-based approaches were limited, but with the advent of machine learning and deep learning, significant progress has been made.

Machine learning approaches require developers to compute features, and for audio, these would typically be the short term power spectrum of a sound, the predominant frequencies occurring in the sound, and others of a similar nature.

Deep learning approaches have made frequent use of the spectrogram. A spectrogram is generated from a time domain signal by calculating the Fourier Transform for overlapping chunks of the signal. The resulting fourier transforms are then joined together to effectively form a 3-dimensional plot of frequency (y-axis) against time (x-axis) with amplitude represented as color.

Spectrograms were transformative in the field of audio recognition, effectively converting audio signals to pictures, and hence allowing the use of AI vision techniques such as convolutional neural networks. This provided a basis for speech style transfer using neural style transfer.

For anyone interested in experimenting with audio recognition using spectrograms, there is an Apollo-friendly version of a simple audio recognition program to try out.

Audio Processing Use Case: Speaker Verification

Speaker verification is the process of identifying a person based on the characteristics of their voice. Each person's voice has a unique set of characteristics, and these can be leveraged to distinguish between people. This is a very good use case for problems relating to sound classification.

The primary application of speaker verification is to verify a person's identity. A user is asked to identify themselves and then speak. If the system confirms the user's identity, the user is granted access. A typical system would have two stages: registration/enrollment and login/verification. Because speaker verification cannot verify the identity of someone who is not in its database, a registration process is usually required.

For multi-factor authentication, speaker verification and speech recognition can be combined, and each speaker can have their own password that the system would have to approve, making system entry text-dependent.

With Apollo, we have included an Apollo-Speaker-Verification demo so you can try it out and use it as a template if you want to pursue such a project further!

Review Series

Ultron can be configured to use any of NVIDIA’s Jetson SoCs available in the SODIMM form factor, including the JetsonNano™️, TX2 NX™️, and Xavier™️ NX.

Tutorial Series

Are you wondering how to get started with your Apollo Engineering Kit? What capabilities and applications can be achieved by Apollo? Look no further, SmartCow AIoT engineers will lead you step by step, from setting up Apollo, audio software processing, NVIDIA® DeepStream and RIVA application development, and so on. At SmartCow, we love to share the latest AI advancement and build audio-visual projects together to enable greater possibilities in the edge AI industries.

Episode 1: Getting Started

Congratulations on getting your hands on the Apollo Engineering Kit! The SmartCow team will work with you to explore the endless potential of AI. In the first video of the Apollo tutorials series, we will help you set up your device and get started with Apollo for your audio and visual applications. SmartCow AIoT Engineer Ryan will walk us through the first steps of getting started with Apollo. Don't hesitate to leave us some comments and show us your work!

Episode 2: Using PyAudio with Apollo

In the second episode of the Apollo tutorials, AIoT Engineer from SmartCow team, Luke Abela shows us how to use PyAudio over Alsa-utils in conjunction with the Apollo speaker and microphones. PyAudio is a set of Python bindings for PortAudio, a C++ library for interfacing with audio drivers. The audio codec and 4x MEMs microphone on Apollo can be used to demonstrate the capabilities of PyAudio to control audio. How do we do that? Let us walk you through the process with the tutorial video below.

Stay Tuned for Episode 3: OLED Display on Apollo

The video will show us how to make your unique and customized OLED display on the Apollo.

Conclusion

To date, Vision AI and Conversational AI have been widely used with very impressive results in a variety of scenarios. SmartCow specializes in video analytics and has extensive experience developing optimized software. At SmartCow, we enjoy building end-to-end AIoT solutions because we are confident in their ability to transform the way the world functions.

We are all as curious as developers, and we never give up on trying new things and expanding our horizons. Conversational AI and natural language processing are expected to be the next advancements in AI technology in the future. The next focus of the AI market would be on how audio AI works and how it interacts with vision AI. With this as our vision, SmartCow built Apollo to enable engineers to create next-generation Audio/Visual AI solutions. We are excited to be a part of your journey!

Troubleshooting

In this section, we have included some frequently encountered hardware and software configuration issues when working with Apollo. Along with the issues, we have provided some suggestions on ways to resolve them.

My speaker is not generating any sound, what should I do

If your speaker is not working, we suggest the following troubleshooting steps:

  1. Open up NVIDIA GUI to see which peripherals are enabled, type the following into the command line:

    sudo /opt/nvidia/jetson-io/jetson-io.py

  2. Ensure that the i2s5 is enabled. If not, Configure 40-pin expansion header and enable i2s5. You will be prompted to save and reboot

  3. Once your device has been rebooted, be sure to configure the soundcard appropriately with the following command:

    amixer -c jetsonxaviernxa sset "I2S5 Mux" ADMAIF2

  4. Finally run a speaker test with this command:

    speaker-test -D hw:jetsonxaviernxa,1 -c 2 -r 48000 -F S16_LE -t sine -f 500

My Microphone isn’t recording anything, what should I do?

If you are struggling to record with your microphone, follow these steps:

  1. The I2S peripheral which connects to the microphones is constantly running, there is no need to address that.

  2. Reconfigure the sound card subdevice appropriately with the following command:

    amixer -c jetsonxaviernxa cset name='ADMAIF1 Mux' I2S3

  3. Record using the following command:

    arecord -D hw:jetsonxaviernxa,0 -c 2 -d 10 -r 48000 -f S32_LE test.wav

My camera is not working, what can I do to troubleshoot?

If your camera is not functioning after connected to Apollo, try out the following command to verify if your camera is working:

  1. ls /dev/video\*

  2. nvgstcapture-1.0 --camsrc=0 --cap-dev-node=<N>

*Note: <N> is the /dev/videoN

My device is heating and/or the fan has stopped working, how can I reset the fan?

Simply execute the following command:

sudo jetson_clocks —fan

What to do if I am having software issues with Apollo?

If you are having software issues, you can post your query along with the tag #Apollo on Nvidia Developer forum from the link below:

https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/70

SmartCow team will look into your posted issue and get back to you as soon as possible.