Foundation AI helps Translutions track vehicles and pedestrians in real-time video

Foundation AI is an Artificial Intelligence Solutions Provider. We help organizations process, manage, and leverage their unstructured data to automate labor-intensive tasks, make better data-driven decisions, and drive real business value.

Translutions is a Transportation Planning and Traffic Engineering company. They provide context-sensitive transportation planning and traffic engineering services to their clients. Their background in architecture, urban planning, and civil engineering provides them with a toolset to address the traffic engineering and transportation planning needs of their public and private sector clients.


Goals

  • To build a solution that processes video footage of intersections of interest.
  • To count the number of cars traveling through the intersection segmented by:
    • Time of Day

    • Directions of Travel

    • Turning Movement

    • Type of Vehicle (Passenger Vehicle and Truck)

  • To calculate average wait time for vehicles based on:
    • Time of Day

    • Direction of Travel

    • Turning Movement

  • To count the number of pedestrians crossing the intersection segmented by:
    • Time of Day

    • Directions of Travel


Approach

  • Used our Extract Vision Object Recognition model to detect different types of vehicles and pedestrians.
  • Applied the Kalman filtering technique to track vehicles and pedestrians through obstructed frames.
  • Performed one time calibration for each camera by labeling street corners, so that the system can automatically identify each vehicle’s direction of travel.

Results

  • The object recognition and tracking pipeline runs in real-time.
  • This resulted in a drastic reduction of man-power needed to quantify traffic flows.
  • Accuracy of Vehicle Detection - 92%
  • Accuracy of Vehicle Count by Traffic Direction - 85%

Background

As part of the California Environmental Quality Act (CEQA), most new development in the state of California requires that the governing municipality estimate the new development’s impact on traffic in the surrounding area. Translutions is a company that is often hired to conduct these studies.

The first part of any such traffic impact analysis is to determine the current state of traffic in a given area. Translutions contracts either to send people to these intersections to manually count and classify traffic or to place cameras at these intersections and then have people manually review the video footage.

Challenge

Manual traffic counting, whether in person or through manual video review, is time-consuming, expensive, and error-prone. Translutions customers rely on their quick turnaround time for traffic impact studies, however manual traffic counting can significantly delay the completion of these studies.

Translutions approached Foundation AI to configure its Extract Video Search solution to help automate its traffic counting process. Translutions contracts to have cameras placed at intersections of interest, then runs the video feed through Extract Video Search to:

Count the number of cars traveling through the intersection segmented by:

  • Time of Day

  • Directions of Travel

  • Turning Movement

  • Type of Vehicle (Passenger Vehicle and Truck)

Calculate average wait time for vehicles based on:

  • Time of Day

  • Directions of Travel

  • Turning Movement

Count the number of pedestrians crossing the intersection segmented by:

  • Time of Day

  • Directions of Travel

Solution

The solution that we have designed consists of multiple parts: vehicle and pedestrian detection, vehicle tracking, and vehicle direction identification.

For vehicle and pedestrian detection, we have used our Extract Vision Object Recognition model. This model performs well at detecting objects that appear in different scales. For example, a bus that is far away and a car that is very close to the camera will appear scaled differently. Extract Vision can also differentiate between the types of vehicles, like passenger vehicles, buses, and trucks.

Once we have detected an object of interest (a vehicle or a pedestrian), the next job is to track the vehicle or a pedestrian from the time it has entered the camera’s view until it has left. To track an object, pedestrian, or vehicle, we assign it a unique ID the moment it enters the video frame. The object is then tracked from the point it has been given that ID until it leaves the camera’s view. To track a vehicle or a pedestrian, the major problem we face is an ID switch. If, for example, one pedestrian walks in front of another, the ID that was assigned to each pedestrian should not change. Similarly for a vehicle, if one vehicle obstructs the camera view of another vehicle traveling in the opposite direction, the ID should not change after the obstruction.

To track an object, through obstructed frames, we initially tried projecting each object’s path using Extrapolation. This approach did not give good results because obstruction often occurs as objects enter the frame. When this happens, the system doesn’t have enough points of motion to properly extrapolate the object’s path.

To solve this, we instead turned to the Kalman filtering technique. The Kalman filter algorithm saves the movement of an object as an abstract representation called the state of the object. Based on the current state, the algorithm predicts the position of the object in the next frame. Based on the accuracy of this prediction, the state of the object is updated. This process then repeats for the next frame. The advantage of this algorithm is that it uses only the current position of the object as an input. This makes the algorithm run faster. On the other hand, Extrapolation uses all the locations that the object has traced to compute the next location, which is extremely time-consuming. Another major advantage of using this method is the probabilistic approach of the algorithm. For every iteration, the Kalman filter outputs the probability that an object will be in a certain location. Even if the object is occluded while in motion, using the Kalman filter algorithm, we can check for the object in multiple probable locations.

The final task is to count the number of vehicles that move in each direction. For this, we calibrate each camera’s footage and tag the corners of each intersection. These corners, when joined, form a boundary for each road passing through the intersection. We can then use these boundaries to detect the movement direction of each vehicle. Every vehicle that enters the frame will cross two boundaries. Based on which boundaries the vehicle has crossed, we can automatically identify the direction of movement.

Results

Our solution made a major impact on the time and effort it takes to manually check the vehicle movement. Our object detection model runs in real-time. Kalman filter runtime changes based on the number of objects detected. By variably changing the frame rate, we are able to track vehicles in real-time. We were able to achieve an accuracy of 92% in Vehicle Detection and 85% in vehicle count by traffic direction.

Using this tool, Translutions was able to drastically reduce the manpower needed for this task.

Artificial Intelligence for the Real World
© 2021 Foundation AI