Scope creep - K Robot Lab

Published: June 21, 2023

Table of contents

People like robots
Training technique for Transformers
Launch strategy

Something interesting always starts with something very stupid

Here's a list of features I planned for the robot originally

Cute
Talk jokes
Finds users face (camera, IR sensor)
Detects emotions
Keep up the conversation
Recognizes speech
Voice generation
Movements (maybe just waving or driving on a flat surface)

But then I estimated that with current advance in Machine Learning it will not be possible to put all these features (read: AI models) on a low power, low cost board (<$100). That's why I started reviewing my idea, and steps to finish it looked simple enough:

Order components: servo, DC, brushless motors, sensors, battery, logic board
Manufacture PCB for motor/power/sensor board
Design 3D body
Face animations on LCD

I left only one feature on the table - face animations (=cute). This was very sad for me, and I also asked myself, what people might like in robots.

People like robots

What do people find interesting about robots? In case the robot is just a toy, then probably

emotional connection
interactivity
technological fascination
novelty

People would love to explore capabilities of robots. I guess many designs are trying to provide as mush as possible different capabilities and its combinations. But is there a way to make designs that offer unlimited amount of capability combinations?

modular design
programmability
integrations
interconnectivity (unlock functionality with more than one robot)
adaptable to the environment (with the help of machine learning)

Also feasible to make a series of models that will stimulate people to get a full collection. For example, I will do different robot designs: one will look like a Viking, another like a cat. But what collective activities this will imply?

Training technique for Transformers

Autonomous navigation is a good example of decision-making. But it’s unclear why the robot needs to follow the path? What’s his goal?

delivery
agricultural
exploration (underwater, planetary surface, territory after disaster)

I like the agricultural robot example. So let’s assume the robot needs to sow, water the fields, locate and remove weeds, harvest. Let’s start with the basics: how does it know when to do all these tasks? I’d apply transfer learning technique. This way we can train the robot on one specie like potato and it will figure out how to do similar things on tomatoes and strawberries and such. With trial and error of course, but that’s where the following techniques will help.

scheduling based on agricultural best practices based on a crop type, growth stage, weather conditions, and regional guidelines
sensor-based monitoring such as soil moisture, temperature, humidity, and light
computer vision to identify the growth stage and health by analyzing leaf size, color, fruit development
find patterns in historical data with machine learning

So how this transfer learning is implemented is a question. Can we load the robot with knowledge every agronomist must know? It will be semantic knowledge and we need to find a way to use it in neural networks. LLMs trained on big corpus of knowledge. And if some general purpose models available online not specialize on agriculture, but they can be fine-tuned on agriculturist’s expertise. And technically LLMs are neural networks, thus we don’t need to worry about transforming semantic data into neural features.

The main question is what training method to choose for Transformers. This of course relates to other neural networks as well. The gradient descent is very slow process. Spiking networks are more efficient and less resource intensive. But how to apply it to Transformers?

And while we theorize about training technique, let's focus on methods that do not require thousands of training patterns. So how about one-shot training? What one-shot learning techniques will be more suitable for Transformers? Is there a method that will not require data augmentation and in case of one-shot learning the training process will end after one sample presented to the model?

Metric Learning techniques attempt to create embeddings that bring similar examples closer together in the embedding space while pushing dissimilar examples further apart. For example, using the Triplet Loss. We will need an anchor sample, positive and negative sample.

TripletLoss = max(0, distance(anchor, negative) - distance(anchor, positive) + margin)

Memory-Augmented Neural Networks (MANN) use the memory to store information from previously seen examples and then access this stored knowledge to make predictions on new, unseen examples, even with limited data. One popular memory-augmented neural network architecture is the Neural Turing Machine introduced by Alex Graves et al. in 2014 (related: Recurrent Entity Networks).