Amazon has filed a patent for AR glasses that use foveated beamforming. This technology relies on eye‑tracking data to identify what the user is looking at, then isolates and improves audio from that spot while reducing background noise by matching audio focus to the user’s gaze. The system aims to make it easier to hear in noisy places, such as picking out a single speaker in a crowd or focusing on sound from a specific device.
How the Gaze Activated Audio System Works
- The AR glasses integrate an array of microphones and utilize infrared or visible-spectrum cameras for high-precision eye tracking. The eye-tracking sensors continuously monitor pupil movement and direction to extract real-time gaze coordinates, while the microphones capture spatial audio signals from the environment. Together, these components enable the system to distinguish between relevant sounds and background noise based on the user’s focus.
- The system calculates a spatial target area corresponding to the user’s gaze point using the extracted eye movement data. It then applies real-time digital beamforming to steer the microphones’ sensitivity toward the focus area, pinpointing sounds originating from the user’s line of sight.
- Acoustic signals from the user’s gaze-aligned focus area are digitally amplified and filtered for clarity, while adaptive noise cancellation algorithms minimize interference from other directions. This process enhances intelligibility and allows the wearer to perceive target sounds more distinctly.
- The patent specifies that the system modulates AR application behaviors by correlating gaze information with audio focus, allowing app functions to dynamically adapt to user attention and contextual audio cues.
Key Features And Applications
- Users can utilize this system to concentrate on a specific speaker or sound source, similar to a hearing aid enhanced with AR visual support.
- Amazon’s broader AI plan also entails recognizing environmental sounds, such as sirens and household noises, and building custom sound models that understand their context.
- The technology integrates gaze direction with detected objects to enable actions such as activating a device by looking at it or focusing on its corresponding sound.
- The system can also work with AI assistants, making it easier to take voice commands from specific people even in noisy environments.
This technology likely relates to Amazon’s ongoing development of Echo Frames and other wearable devices designed to enhance users’ visual and auditory experiences.
A gauge-tracking technique uses a head-mounted device that sends data to a server. The device captures images of what the user sees and information about where the user is looking. The server runs an image recognition algorithm to identify the viewed items and creates a log of them.
Technical Field
This disclosure is about client‑server computer processing techniques. It focuses primarily on a gaze-tracking system.
Background Information
Eye tracking systems use cameras to measure where a person is looking by tracking eye movement and position. These systems have been used in human-computer interaction, psychology, and other research fields. Several methods exist for measuring eye movement. One such method is analyzing video images to find eye position. So far, most eye tracking systems have been used for research. They are often intrusive, expensive, or unreliable. A reliable, affordable, and easy-to-use system could have many practical, everyday users.
Summary
This disclosure describes different ways to implement a gaze-tracking system. In one example, the method includes receiving images of what the user sees from a head-mounted device and sending them to a server over a network. The server also gets information about where the user is looking. It uses image recognition to find items in images and logs what the user viewed.
Another example involves capturing real-time images of what the user sees with a forward-facing camera built into the eyeglasses. A separate gaze-tracking camera on the glasses records images of the user’s eye. The system uses these eye images to determine where the user is looking, then identifies which item in the scene the user is focusing on.
Another version of the system uses eyeglasses frames with sidearms for the users’ ears and lenses that are partly transparent and partly reflective. A forward-facing camera on the frame records images of what the user sees, while another camera records the user’s eye by reflecting it off the lens. A processing system connects to both cameras to match the eye image with the scene image, helping track what or whom the user is looking at.
Further details and other examples are provided in the drawings, description, and claims.
Source: Gaze tracking system










