Posted on Leave a comment

AI Robot Head Tracking Using Browser Vision (No Training, No Server)

Turn your head left, right, up, and down.
Open circuit Open circuit

Download FreeCAD project and STL files

Introduction

Most “AI robot face” demos rely on heavy machine-learning models, cloud APIs, or large datasets.
This project explores a different approach:

Using simple geometry and browser-based vision to create an expressive robot head that reacts to human movement in real time.

The robot head tracks left, right, up, and down head motion and mirrors it using animated eyes displayed on a small OLED screen — all controlled directly from a web browser.

No dataset.
No model training.
No server.

What This Project Does

  • Detects a human face using the browser camera
  • Estimates head orientation (yaw & pitch)
  • Sends motion data to an Arduino using Web Serial
  • Animates eye movement on a 0.96″ OLED display
  • Allows direction inversion (mirror correction) at runtime

The result is a small robot head that feels responsive and alive.

System Architecture

Browser Camera
      ↓
Face Geometry (MediaPipe)
      ↓
Yaw & Pitch Calculation
      ↓
Web Serial (USB)
      ↓
Arduino UNO
      ↓
OLED Eye Animation

All computation happens locally in the browser.

Hardware Used

  • Arduino UNO
  • 0.96″ OLED (SSD1306, I2C)
  • USB cable
  • 3D-printed enclosure

Software Stack

  • HTML + JavaScript
  • MediaPipe Face Mesh
  • Web Serial API
  • Arduino (C++)

Key Design Decisions

1. Pupils Move, Not Eyeballs

Moving only the pupils inside fixed eyeballs makes the face feel more natural and expressive.

2. Face-Relative Geometry

Head motion is measured relative to facial landmarks, not camera pixels.
This makes movement symmetric and stable.

3. Runtime Direction Flip

A toggle button allows instant correction for mirrored cameras without changing code.


Educational Value

This project can be used to teach:

  • Coordinate systems
  • Geometry-based tracking
  • Browser ↔ hardware communication
  • Human-centered design

It is suitable for classrooms, labs, and exhibitions.


Conclusion

This robot head demonstrates that intelligence is not just about models, but about understanding interaction.

By combining browser vision, simple math, and embedded hardware, we can build systems that feel responsive, expressive, and intuitive — without complexity.


Try It Live

Allow camera access, connect the Arduino, and move your head left, right, up, and down.

Posted on Leave a comment

What are Computer vision libraries?

Computer Vision is a rapidly growing field that deals with enabling machines to interpret, analyze, and understand digital images and videos. Here are some of the top computer vision libraries that can help developers to build powerful computer vision applications.

OpenCV

OpenCV is a widely-used open-source computer vision library that provides developers with a range of tools for image and video analysis, object detection, face recognition, and more. OpenCV is written in C++ and supports multiple programming languages such as Python, Java, and MATLAB.

  • Official website: https://opencv.org/
  • User-friendliness: Easy to use with extensive documentation and tutorials.
  • Community support: Large and active community with frequent updates and contributions.

TensorFlow

TensorFlow is an open-source machine learning framework that includes a range of tools for image recognition, object detection, and classification. TensorFlow supports multiple programming languages, including Python, C++, and Java.

  • Official website: https://www.tensorflow.org/
  • User-friendliness: Easy to use with extensive documentation and tutorials.
  • Community support: Large and active community with frequent updates and contributions.

PyTorch

PyTorch is an open-source machine-learning library that includes a range of tools for image recognition, object detection, and segmentation. PyTorch supports multiple programming languages, including Python, C++, and Java.

  • Official website: https://pytorch.org/
  • User-friendliness: Easy to use with extensive documentation and tutorials.
  • Community support: Large and active community with frequent updates and contributions.

Caffe

Caffe is a deep learning framework that includes tools for image classification, segmentation, and detection. Caffe is written in C++ and supports multiple programming languages such as Python and MATLAB.

  • Official website: http://caffe.berkeleyvision.org/
  • User-friendliness: Moderate difficulty with a learning curve.
  • Community support: Medium-sized community with frequent updates and contributions.

Keras

Keras is an open-source deep-learning library that provides tools for image recognition, object detection, and classification. Keras supports multiple programming languages, including Python and R.

  • Official website: https://keras.io/
  • User-friendliness: Easy to use with extensive documentation and tutorials.
  • Community support: Large and active community with frequent updates and contributions.

scikit-image

scikit-image is a Python library that provides tools for image processing, including filtering, segmentation, and feature extraction.

  • Official website: https://scikit-image.org/
  • User-friendliness: Easy to use with extensive documentation and tutorials.
  • Community support: Large and active community with frequent updates and contributions.

These computer vision libraries offer a wide range of tools and functionalities for developers to work with. Choosing the right library largely depends on the requirements and specific use cases of the project.