The key idea of the zero velocity update (ZUPT) is to allow for the system to reduce its uncertainty leveraging motion knowledge (i.e. leverage the fact that the system is stationary). This is of particular importance in cases where we have a monocular system without any temporal SLAM features. In this case, if we are stationary we will be unable to triangulate features and thus will be unable to update the system. This can be avoided by either using a stereo system or temporal SLAM features. One problem that both of these don't solve is the issue of dynamic environmental objects. In a typical autonomous car scenario the sensor system will become stationary at stop lights in which dynamic objects, such as other cars crossing the intersection, can quickly corrupt the system. A zero velocity update and skipping feature tracking can address these issues if we are able to classify the cases where the sensor system is at rest.
To perform update, we create a synthetic "measurement" which says that the current true acceleration and angular velocity is zero. As compared to saying the velocity is zero, we can model the uncertainty of these measurements based on the readings from our inertial measurement unit.
 
It is important to realize this is not strictly enforcing zero velocity, but really a constant velocity. This means we can have a false detection at constant velocity times (zero acceleration), but this can be easily addressed by a velocity magnitude check. We have the following measurement equation relating this above synthetic "measurement" to the currently recorded inertial readings:
 
It is important to note that here our actual measurement is the true  and
 and  and thus we will have the following residual where we will subtract the synthetic "measurement" and our measurement function:
 and thus we will have the following residual where we will subtract the synthetic "measurement" and our measurement function:
 
Where we have the following Jacobians in respect to our state:
 
Zero velocity detection in itself is a challenging problem which has seen many different works tried to address this issue [Wagstaff2017IPIN], [Ramanandan2011TITS], [Davidson2009ENC]. Most works boil down to simple thresholding and the approach is to try to determine the optimal threshold which allows for the best classifications of zero velocity update (ZUPT) portion of the trajectories. There have been other works, [Wagstaff2017IPIN] and [Ramanandan2011TITS], which have looked at more complicated methods and try to address the issue that this threshold can be dependent on the type of different motions (such as running vs walking) and characteristics of the platform which the sensor is mounted on (we want to ignore vehicle engine vibrations and other non-essential observed vibrations).
We approach this detection problem based on tuning of a  , chi-squared, thresholding based on the measurement model above. It is important to note that we also have a velocity magnitude check which is aimed at preventing constant velocity cases which have non-zero magnitude. More specifically, we perform the following threshold check to see if we are current at zero velocity:
, chi-squared, thresholding based on the measurement model above. It is important to note that we also have a velocity magnitude check which is aimed at preventing constant velocity cases which have non-zero magnitude. More specifically, we perform the following threshold check to see if we are current at zero velocity:
 
We found that in the real world experiments, typically the inertial measurement noise  needs to be inflated by
 needs to be inflated by ![$\alpha\in[50,100]$](form_446.png) times to allow for proper detection. This can hint that we are using overconfident inertial noises, or that there are additional frequencies (such as the vibration of motors) which inject additional noises.
 times to allow for proper detection. This can hint that we are using overconfident inertial noises, or that there are additional frequencies (such as the vibration of motors) which inject additional noises.
We additionally have a detection method which leverages the visual feature tracks. Given two sequential images, the assumption is if there is very little disparity change between feature tracks then we will be stationary. Thus we calculate the average disparity and threshold on this value.
 
This seems to work reasonably well, but can fail if the environment is dynamic in nature, thus it can be useful to use both the inertial and disparity-based methods together in very dynamic environments.