Fall Detection

The aim of this project is to detect dangerous events in live video streams, where people end up lying down on the ground for whatever reason (strokes, heart attacks, colliding with cars...). That should trigger an alert to make users check the image and call help if needed. It's terrible that even though they're right on camera, people die or suffer brain damage lying there low on oxygen just because no human is watching right then. To save lives, that kind of emergency recognition should be a freely available, easy to use and low overhead (ideally local) functionality for everybody who live-streams and monitors surveillance camera footage, and of course for those who write and distribute software to that end.

Computer vision and machine learning do have a heavy academic workload but when focusing on the actually working and up to date solutions in the offer you can still learn and implement pretty much in parallel and that is a lot of fun. That fall detector is a good opportunity to look at anything from more traditional computer vision to the latest neural network based visual recognition and just try it if it fits, without too much front-load. Unfortunately I stalled a few weeks ago mainly because I don't have enough images to train the neural network to achieve higher accuracies (with much less false positives) and can't think of an easy way to get them right now.

So here is the flow so far: the python code streams frames, at first to track motion. At that stage it would probably be a good idea to make sure we are looking at a moving human. But only if that's reliable and cheap enough for real time evaluation. The HOG for recognizing passersby that comes with OpenCV is fast but misses too much. Tapping a winning imagenet model to whitelist categories of persons could work, but I didn't have enough memory to implement (those models are huge) and am doubtful that it would deliver in real time.

Now assuming it's a person we are tracking, the fact they are moving usually implies they're walking on their feet and the rectangle surrounding their motion has a "tall" ratio (sides longer than base and top). Another assumption is that falling down should be a significant and sudden movement indicated by a somewhat large numerical difference between two frames.

So first we wait until two frames differ a lot, because this might be due to a person falling. The next test is for ratio changes: should they fall down and the movement tracking works perfectly and uninterrupted, their once tall ratio changes such that top and bottom become longer than the sides as they collapse and hit the ground. Or the motion tracking fails and the original rectangle decays into a weird cluster of scattered rectangles of different ratios while the person is going down. In that case chances are very high that at least one of those noisy rectangles has top and bottom longer than the sides, which is also good enough.

Partly relying on pure coincidence is not exactly an engineering feat but I suspect it will work just fine compared to adding any number of more sophisticated layers of hard coded tests you could think of - whatever you do adds some kind of error you'll be trying to cancel in the next layer and so on, and - come to think of it - that's precisely what machine learning is for.

But still you can't just have every frame checked by the neural network because even simple model estimates are too slow for that. And even if they weren't, many many false positives on frames you should not be evaluating in the first place would soon kill the vibe.

So only the image section where a suspicious ratio change following a big motion happened gets cropped and then evaluated by a neural network that is trained to recognize people lying on the ground. If our threshold is exceeded, we fire an alert. This can be done in real time.

You can see true positives in the video below - the thing is actually working and that is something I am genuinely happy about.

But I have a hunch that in the real world false positives would make most users switch the recognition off, seeing how those errors would propagate through a sequence of hundreds if not thousands of frames, so I think it is still useless practically.

The network just isn't accurate enough. It already tests over 80%, but that's on an insignificantly small number of images I reserved for testing. I only have a few hundred training images, pretty much everything google images finds for the keywords I could come up with. Aggressive augmentation did not cut it either. Maybe the diversity of positives is difficult to abstract upon. People can lie there in many different angles and directions, with or without obfuscation by things and passersby and helpers, face down or face up and what not.

By the way before building the neural network I did train haar-like and LPB classifiers with mostly the same images, resulting in false positives with a clearly prohibitive frequency. But I remember the haar-like, when true positive, seemed sensitive to how unconscious people on the ground spread their arms and legs - that group of features still seems promising to me.

I was thinking about buying a camera and feeding pictures of myself to the network but then those would not generalize too well to the rest of mankind. Manually taking pictures of others I might ask to lie down against different backgrounds would take quite a lot of real life time. Also I expect diminishing (accuracy) returns adding those new pictures to the model. And probably somewhere in the tens of thousands of images I would start needing proper GPU.

Anyway here is the python code that learns the model ('vanilla' convolutional neural network via keras, if you are working on 32bit windows you'll want the theano backend). I trained only briefly, maybe less than 20-30mins on a slow machine, trying to find the number of epochs right before overfitting starts, with different settings each competing for the highest test accuracy.

# -*- coding: utf-8 -*-

from __future__ import print_function
import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.models import load_model
from keras import backend as K
import os

batch_size = 32
epochs = 8
data_augmentation = False

save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'whatever.h5'

datagen = ImageDataGenerator(

train_datagen = ImageDataGenerator(

train_generator = train_datagen.flow_from_directory(
   'PATHTODIRECTORY', # this is the target directory
   target_size=(128, 128),
   class_mode='binary') # since we use binary_crossentropy loss, we need binary labels
validate_generator = datagen.flow_from_directory(
   'PATHTODIRECTORY', # this is the target directory
   target_size=(128, 128),
test_generator = datagen.flow_from_directory(
   'C:/Users/owner/Desktop/data/test', # this is the target directory
   target_size=(128, 128),

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same',
   input_shape=(3, 128, 128)))
model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))


# initiate optimizer
opt = keras.optimizers.Nadam()


if not data_augmentation:
  print('Not using data augmentation.')
   steps_per_epoch = 1294//batch_size,
  print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
  datagen = ImageDataGenerator(
   featurewise_center=False, # set input mean to 0 over the dataset
   samplewise_center=False, # set each sample mean to 0
   featurewise_std_normalization=False, # divide inputs by std of the dataset
   samplewise_std_normalization=False, # divide each input by its std
   zca_whitening=False, # apply ZCA whitening
   rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
   width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
   height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
   horizontal_flip=True, # randomly flip images
   vertical_flip=False) # randomly flip images

# Score trained model.
scores = model.evaluate_generator(test_generator, 154)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
# Save model and weights
if not os.path.isdir(save_dir):
model_path = os.path.join(save_dir, model_name)
print('Saved trained model at %s ' % model_path)

And here is the dirty and dangerously simplistic python to run crops of suspicious frames by the model. OpenCV is the obvious choice to manipulate the images.

# -*- coding: utf-8 -*-
import cv2
import numpy as np
from pathlib import Path
from keras import backend as K
from keras.models import load_model
from keras.preprocessing import image as image_utils
model = load_model('YOURPATHTOMODEL')

cam = cv2.VideoCapture('YOURPATHTOVIDEO')
cam.set(cv2.CAP_PROP_FPS, 32)
fgbg = cv2.bgsegm.createBackgroundSubtractorMOG()

buffer = 30
checkAspect = False

#Utility to check if rectangles overlap
def rectOverlaps(x0, y0, w0, h0,x1,y1,w1,h1):
  if ( #horizontal overlap
    ((x0 - buffer <= x1 + buffer)&((x0+w0)>(x1+w1)))|
    ((x0 + buffer >x1 - buffer )&((x0+w0)>(x1+w1)))|
    ((x0 - buffer <=x1+buffer)&((x0+w0)<=(x1+w1)))
   #vertical overlap
    return True
   return False
#draws geometry where motion is found, angled rectangle etc
def draw_motion_comp(vis, x, y, w, h, angle, color):
  cv2.rectangle(vis, (x, y), (x+w, y+h), (0, 255, 0))
  r = min(w/2, h/2)
  cx, cy = x+w/2, y+h/2
  angle = angle*np.pi/180, (int(cx), int(cy)), int(r), color, 3)
  cv2.line(vis, (int(cx), int(cy)), (int(cx+np.cos(angle)*r), int(cy+np.sin(angle)*r)), color, 3)
  return True

timestamp = 0
ret, frame =
frame = cv2.resize(frame,(256,256)) #low resolution is OK
grey = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
this_diff = grey.sum()
while True:
  timestamp = timestamp+1
  timer = cv2.getTickCount()
  prev_frame = frame.copy()
  ret, frame =
  frame = cv2.resize(frame,(256,256))
  h, w = frame.shape[:2]
  motion_history = np.zeros((h, w), np.float32)
  prev_diff = this_diff

   frame_diff = cv2.absdiff(frame, prev_frame)
   gray_diff = cv2.cvtColor(frame_diff, cv2.COLOR_BGR2GRAY)
   this_diff = gray_diff.sum()
   #check aspect ratio if there was a lot of movement, indicated by difference between two frames
   if ((this_diff > (prev_diff*2))):
    print("lot of Movement")
    checkAspect = True
   prev_diff = this_diff

   ret, historymask = cv2.threshold(gray_diff, 32, 1, cv2.THRESH_BINARY)
   cv2.motempl.updateMotionHistory(historymask, motion_history, timestamp, 10)
   mg_mask, mg_orient = cv2.motempl.calcMotionGradient( motion_history, MAX_TIME_DELTA, MIN_TIME_DELTA, apertureSize=5 )
   seg_mask, seg_bounds = cv2.motempl.segmentMotion(motion_history, timestamp, MAX_TIME_DELTA)

   for i, rect in enumerate([(0, 0, w, h)] + list(seg_bounds)):
    x, y, rw, rh = rect
    area = rw*rh
    if area < 64**2:
    silh_roi = historymask [y:y+rh,x:x+rw]
    orient_roi = mg_orient [y:y+rh,x:x+rw]
    mask_roi = mg_mask [y:y+rh,x:x+rw]
    mhi_roi = motion_history[y:y+rh,x:x+rw]
    if cv2.norm(silh_roi, cv2.NORM_L1) < area*0.05:
    angle = cv2.motempl.calcGlobalOrientation(orient_roi, mask_roi, mhi_roi, timestamp, MHI_DURATION)
    color = ((255, 0, 0), (255, 0, 0))[i == 0]
    draw_motion_comp(motion_history, rect[0], rect[1], rect[2], rect[3], angle, color)
   cv2.imshow('motempl', motion_history)
   timestamp += 1
   fgmask = fgbg.apply(frame)
   thresh = cv2.threshold(fgmask,125,255,cv2.THRESH_BINARY)[1]
   (_,cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
   count = 0
   showframe = frame.copy()
   for c in cnts:#this is actually a huge list of points. It can be processed to bounding Rectangles by calling boundingRect

   if (cv2.contourArea(c)>50):
    M = cv2.moments(c)
    cX = int(M["m10"] / M["m00"])
    cY = int(M["m01"] / M["m00"])
    epsilon = 0.1*cv2.arcLength(c,True)

    [x,y,w,h] = cv2.boundingRect(c)
     if ((w/h)>1):
      print("aspect Ratio weird, fallen down?")
      checkAspect = False
      height, width, channels = frame.shape
      testImage = frame.copy()
      #crop image, just fuzzy quarters not very accurate here. Must improve that, would miss a lot of important stuff on borders
      if (x > width/2):
       if (y < height/2): #top right
        lowerBorder = height/2
        if(y+h>height/2):lowerBorder = y+h
        testImage = frame[0:int(lowerBorder), int(width/2):int(width)].copy()
       if (y >= height/2): #bottom right
        testImage = frame[int(height/2):int(height),int(width/2):int(width)].copy()
      if (x <= width/2):
       rightBorder = width/2
       if((x+w)>(width/2)):rightBorder = x+w
       if (y < height/2): #top left
        lowerBorder = height/2
        if(y+h>height/2):lowerBorder = y+h
        testImage = frame[0:int(lowerBorder), 0:int(rightBorder)].copy()
       if (y >= height/2): #bottom left
        testImage = frame[int(height/2):int(height),0:int(rightBorder)].copy()

      testImage = cv2.resize(testImage,(128,128))
      #and there's the test image. Now convert for keras predictions:
      testImage = image_utils.img_to_array(testImage)
      classes = model.predict(testImage[None,:,:,:])#Don't even know what those parameters could mean if they had any values, took a while to find out the correct syntax for that nothing
      if (classes>=0.3): #at last the alert
        cv2.putText(showframe, "MAN DOWN!", (100,50), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,255), 2);
    rect = cv2.minAreaRect(c)
    rotbox = cv2.boxPoints(rect)
    rotbox = np.int0(rotbox)

    if (len(c)>5):#actually not needed. Thought watching ellipses might inspire better conditions
     ellipse = cv2.fitEllipse(c)
     (x,y),(MA,ma),angle = cv2.fitEllipse(c)
     cv2.ellipse(showframe,ellipse,(0,255,255),2), (cX, cY), 2, (255, 255, 255), -1)
    count = count+1
  mh = np.float32(np.clip((motion_history-(timestamp-MHI_DURATION)) / MHI_DURATION, 0, 1)*255)
  mh = cv2.cvtColor(mh, cv2.COLOR_GRAY2BGR)
  cv2.imshow('motempl', motion_history)

  prev_frame = frame.copy()
  k = cv2.waitKey(30) & 0xff
  if k == 27:
I don't want to share the images because I don't own them - if you scrape the internet for images yourself you'll probably end up with more and higher quality images than I did, without much trouble. But be prepared to run into some rather terrifying footage, not a task to pursue for hours on end. Here's one version of the model to play with. Again, I don't think this is mature for actual use yet.
Download the model 

Copyright © 2016-2024