Visual Human Activity Recognition (HAR), by means of an object detection algorithm, can be used to localize and monitor the states of people with little to no obstruction. The purpose of this paper is to discuss a way to train a model that has the ability to localize and capture the states of underground miners using a Single Shot Detector (SSD) model, trained specifically to make a distinction between an injured and a non injured miner (lying down vs standing up). Tensorflow is used for the abstraction layer of implementing the machine learning algorithm, and although it uses Python to deal with nodes and tensors, the actual algorithms run on C++ libraries, providing a good balance between performance and speed of development. The paper further discusses evaluation methods for determining the accuracy of the machine-learning progress. For future work, data fusion is introduced in order to improve the accuracy of the detected activity/state of people in a mining environment.