A picture is worth 1000 words when trying to explain a computer vision algorithm. Let’s look into optical flow.
Optical flow allows automated detection of motion in an image by comparing pixel intensity over time. The default parameters in OpenCV are useful for most cases, but sometimes you need to fine-tune them for a specific use case.
While working on a project to translate motion into LED color, I made a simple script to compare the effects of various parameter combinations so that I can get the resolution I need for my project. Hopefully sharing these results will help others who were equally perplexed.
Let’s compare the parameters used in the iterative Lukas Kanade method with pyramids,
First, detect features using OpenCV’s
goodFeaturesToTrack with the Shi Tomasi algorithm with uses eigenvalues of the second-moment matrix:
# params for ShiTomasi corner detection feature_params = dict( maxCorners = 100, qualityLevel = 0.3, minDistance = 7, blockSize = 7 )
All videos ran with these parameters unless otherwise noted.
For comparison, I chose a video clip of a crowd that is available for public use and has motion at various depths. Some things to notice when comparing the videos are how many features are (accurately) detected, how consistent the features are tracked with optical flow, and how resilient the tracking is due to change in lighting or local contrast. Notice which features are (and are not) detected, and when optical flow does not function as you expect.
The most intuitive parameter is
maxCorners which controls how many corners are detected in each frame. A low
maxCorners parameter leads to unidentified features:
Increasing this value will increase the number of features are detected, here up to 500:
qualityLevel is a value between 0-1 which denotes the minimal quality threshold. Features with quality below this are rejected. A lower value means more features are allowed (
qualityLevel = 0.1):
Whereas a higher quality level is very restricted, eg,
qualityLevel = 0.95:
# params for optical flow lk_params = dict( winSize = (15,15), maxLevel = 2, criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
Detect the motion of specific points or the aggregated motion of regions by modifying the
winSize argument. This determines the integration window size. Small windows are more sensitive to noise and may miss larger motions. Large windows will “survive” an occlusion.
The integration appears smoother with the larger window size (here it is 400x400):
In the other direction, a narrow window size of 3x3 is more precise and “sticks” to the objects:
criteria has two interesting parameters here - the max number (10 above) of iterations and epsilon (0.03 above). More iterations means a more exhaustive search, and a smaller epsilon finishes earlier. These are primarily useful in exchanging speed vs accuracy, but mainly stay the same.
maxLevel is 0, it is the same algorithm without using pyramids (ie,
calcOpticalFlowLK). Pyramids allow finding optical flow at various resolutions of the image. An illustration of the pyramids is available here.
I hope this brief explanation of parameters for optical flow clears up some of their mystery.