home home Staff Contact
Forensic Protection - Services Forensic Protection - Rates Forensic Protection - FP_System Forensic Protection - Feedback
AVFA career

AUDIO VIDEO FORENSIC ANALYST - Career Preparation

As an audio video forensic analyst, the majority of your work will be to improve the perceived audio or visual clarity of a digital recording. However, your work will also require testing authenticity, measurements, cross referencing data, a high standard of ethics, and applying peer-reviewed methodology in preparation of your expert testimony in the service of justice. You will be expected to understand industry best practices, stay atop innovative peer-reviewed technologies and methods, adapt existing knowledge into unexpected circumstances, and understand the rules of evidence applicable to each case that you are serving.

Before you can authenticate or enhance a recording, you need to understand how recorded data is retained. Let us consider the simpler example of  audio recordings, where sound is represented by values of intensity (usually in the thousands) spaced apart by some fixed time interval (also in the thousands) and usually in one (mono) or two (stereo) channels. The resulting recording will likely be compressed to retain a small fraction of the potential values.

It is known that the frequency range (measured in Hertz from about 20 Hz to 20 kHZ) of most recording microphones exceeds the frequency range of what we typically hear (approximately 100 Hz to 6 kHz), with human speech being an even smaller subset. To reduce file size, digital audio recorders discard much of this unused headroom prior to saving the recording as an electronic file. The process of file size reduction is a balancing act because removing to much of the high frequency range will result in voice pitch distortion. If too few bits are used to represent each data point, then the signal-to-noise ratio is reduced.

If the amplitude of the recording exceeds the allowed values (measured in decibels) then some data will be clipped, causing distortion. The recording may also suffer from reverberation (successively fading echoes). Individually, each of these issues may not seem substantial, but collectively they can make words unintelligible and inhibit audio enhancement. Many of these issues become obvious during the analyst's initial process of critical listening. To avoid introducing additional data defects, the analyst must maintain a high quality lossless format throughout their entire analysis and enhancement process.

As the analyst listens to the recording, they will gather clues regarding which processes and enhancement filters should be applied. For example, while one can simply attenuate a notched parasite sound, adaptive time-frequency filters are generally the better choice to suppress an unwanted dynamic sound.

A similar logic applies to video recordings, but here the data set is far more complex. For example, most surveillance videos are composed of unique visual frames comprised of two interlaced sequential moments in time (aka interlaced). Prior to applying enhancement or performing measurements, these unique moments (fields) must be separated, thus doubling the video's total frame count, frame rate and aspect ratio (width to height). Following the old adage of "garbage in equals garbage out", to achieve the greatest enhancement clarity, the expert must start with the native, and likely proprietary, recording in order to preserve field integrity and minimize the initial compression losses.

Modern videos are compressed in stages. One of the lossless stages involves using tokens to define motion and pixel blocks, saved as "p" and "b" frames, that reference specific full video moments ("i" frames). Quantization tables are used to squeeze out seemingly imperceptible visual data, but this process is lossy and will cause visual distortion, especially if the codec's compression is excessively applied. The resulting video is then saved in either an open or proprietary format. Some third party programs (e.g. VideoCleaner) can convert certain proprietary video stream into lossless open formats, but the original metadata may only be available with the originating file. There are also tools that can check for the existence of post-production content manipulation or Steganography (e.g.
Openpuff).

Proprietary surveillance videos commonly use a variable frame rate and, if direct extraction is not possible, the expert must use some method to recapture the visual contents of the originating video. Since the capture process is at a fixed frame rate, the frame rates will not match. If the capture rate is set too low, then some of the originating frames will be missing. If the capture rate is set too high, then duplicate frames will be acquired. depending on the method of screen capture used, it is also possible for the expert to unintentionally record blended frames on when the newer moment hasn't finished refreshing the screen. For these reasons, the expert will want to use the lowest capture frame rate that will insure no unique moments are missed.

Audio and video recordings are typically viewed as amplitude relative to time. This is perceived as volume with audio, and as brightness with video. Another perspective is to view a recording as frequency relative to time using a Fast Fourier Transform (FFT). A FFT domain filter enables the analyst to more easily detect content tampering or to remove transitory audio or visual defects that obscure details. FFT filters are extremely effective and forensic software can automate the process to remove judgment-based errors. Even so, the analyst must remain vigilant because any filter that remedies one issue will have a negative impact on the remaining data, even is in nearly imperceptible ways. This is a perfect example of the Locard principal, where each action that affects something will also leave some trace behind. If the evidence being examined was tampered with, that action will also leave some trace, which the analyst can use to determine what actions occurred.

It is expected that anyone performing authentication testing on an audio, video or image files will follow a standardized suite of tests (e.g. the popular
MAT form). Some authentication test are conclusive (e.g. proprietary, structural metadata), while others (e.g. critical analysis, DCT) must be weighted, in order to form a final opinion. For example, Video Error Level Analysis (VELA) can potentially draw attention to a cropped video or a removed object, but it can also produce a false positive if you don't understand the contrast correlation in the results.

Another example is the existence of a subsonic audio impulse found below the frequency range of the recording microphone. This impulse could have originated by someone pausing the recorder, or it could have been caused by an electrical issue. The analyst needs a strong understanding of each test and the possible results, and this variance is why their summary opinion may range from a reasonable to a definitive level of confidence, but will never be expressed as a scientific certainty.

There are numerous available articles and classes to become skilled at enhancement (e.g. here), so I will not labor those points here. The actual results of audio or video enhancement will depend upon the analyst's methods, applied software tools, and the quality of their vision and hearing (both of which should be routinely tested). The lack of industry enhancement standardization stems from the extensive variance of capturing equipment and recording devices being used. Those variances, and the manufacturer's propensity to maintain their own proprietary compression methods, prevent the development of a one-size-fits-all enhancement guideline or solution.

As a forensic analyst, you are expected to understand the procedures of the rules of evidence applicable to the jurisdiction of your case. You will be expected to maintain data integrity through the use of hash values and/or chain-of-custody control, and to keep detailed notes of your activity on each case. Although you will communicate and work at the direction of whoever hires you, you work solely for the evidence and in accordance with the highest ethics. If you calculate the hash value for each file, then everyone can use this value to validate evidentiary integrity regardless of how those files are shared from that point forward.

It is important for your report and testimony to detail all of your tests and results, including those that may be in conflict with each other or the objectives of your engagement. You have significant discretion as to which steps are performed and exhibits that you produce, but you are expected to fully disclose and support your choices. If you want your expert work and opinions to survive a Frye challenge, they must be your own opinions. If you want your CV to survive a Daubert challenge, then it must support your qualifications to form your opinions. It is your job to only draw opinions from within your area of expertise, and it will be the presiding court that will determine if your opinions will be entered into the record.

Let’s say that you are hired to determine someone’s height and you only have a single camera view to work with. Within that video, you must find a reference object of a definable size (e.g. a doorway of known height that the subject walks through) and a video still depicting when the subject walks through that doorway. Multiply the height of that doorway by the pixel height of the person in your still, and then divide that result by the pixel height of the doorway to determine the actual height of the person in question. Using a reference object to measure the size of people or things in the scene is called Photogrammetry, just as measuring speed or acceleration is called Videogrammetry.

The court will expect measurement work to be scientific, and for that you must include a margin-of-error. For the above example, this would mean dividing some constant by the calculated pixel height of the person. Thus, if you calculated the person as 5’10¼” with a ¼” margin-of-error, then you can be 68% (sigma 1) confident in a height range between 5’10” and 5’10
½”, or 99.7% (sigma 3, which is 3 standard deviations) that the person's height is between 5’9½” and 5’11”.

This height example is a simplified scenario, as you would need to apply corrective geometry if the reference object was not directly in-line with the person (e.g. nearby sign or wall). As for the constant, my peers use 24 to compute an answer in inches.  This constant is the result of data from hundreds of actual cases, data on human heights, and a very conservative expectation of increased resolution from video enhancement. As you advance in your career, your experience will help develop new methods and refine existing models.

Isolating unique data or artifacts can help the forensic analyst find new information. For example, the ever changing silent electrical network frequency (ENF) generated by our nation’s power grid has been documented for decades, and thus isolating that ENF from a recording can be used to determine the approximate when and where of a recording’s origination, and then those details can be compared to known case facts and the file’s metadata to determine evidentiary authenticity. Even the interfering noise embedded within a file can be used to identify the specific equipment or handling that produced the recording.

ENF, Photogrammetry and Videogrammetry are just a few examples of how a skilled forensic analyst can extract new facts from existing evidence in a truly scientific method, and thus follow an established formulaic process. By contrast, the processes of enhancement and identification can not produce a measurable scientific error rate, and thus oversight comes from the opinions methods and the expert's qualifications being objectively reviewed by the courts and other experts.

As an audio video forensic analyst, you are tasked with using industry accepted technology, understanding the limitations of that technology, using peer-reviewed methodology, and understanding the strength of any opinions that can be formed. No enhancement process can achieve the fantasy expectations depicted on television, which is why you always want to apply a soft and realistic hand when attempting to enhance a recording. For example, while attempting to improve the clarity of subtle motion, you should avoid excessive sharpening of high energy details, or adding excessive brightness, as you may actually destroy the details that you intended to improve. Even the simple task of opening a recorded file can alter its metadata, which is why you must always work from an exact copy.

Never forget that your opinions may deeply affect someone's life and your impartiality is critical. For this reason, you must avoid forming a bias. If you are enhancing an audio file, do not read the transcript or learn the expected wording until after your enhancement work is complete. If asked to clarify a face, use some other known object as your working reference. For example, when I was asked to enhance the head of George Zimmerman, I instead enhanced the badge of the nearby officer so as not to enhance to a preconception. Most importantly, if you are asked to support an indefensible position, consider walking away because your integrity is your most valuable asset and once it is gone, so is your credibility.

If you want to learn more about becoming a certified Audio Video Forensic Analyst, consider additional reading (here and here), on-line training, and the accredited certification (find a local testing site here). 

Copyright © Forensic Protection
QuickLinks | Main page | Case study | Media | FAQs | Contact us