Extracting GPS data from Viofo A119 and other Novatek powered cameras

The script.

Here it is: nvtk_mp42gpx.py

What does it do?

This script will attempt to extract GPS data from Novatek MP4 file and output it in GPX format.

Usage: ./nvtk_mp42gpx.py -i<inputfile> -o<outfile> [-f]
        -i input file (will quit if does not exist)
        -o output file (will quit if exists unless overriden)
        -f force (optional, will overwrite output file)

In short: it takes Novatek encoded MP4 file (with embedded GPS data) and extract GPS data in GPX format (as separate file). Note; it does not modify the original MP4 file.

In long:

What the? Where is the bloody GPS data?
Unlike competitors (Ambarella and such) Novatek actually embeds the GPS data in MP4, specifically in free atoms/boxes in midst of the stream chunks. This is a bit different than Amberalla’s embedding in the subtitle track (which is trivial to extract with open source tools).

The search for documentation for the Novatek data structure was a scavenger hunt on itself.

Although writing the MP4 rudimentary parser took longest time, figuring out Novatek data structure was more complicated due to lack of information.

MP4 container basics
Disclaimer: I am no expert or even at enthusiast level regarding the video containers, thus information below is not guaranteed to be correct. These are simply my findings and should be “taken with a grain of salt”.

In very simple terms the MP4 container consists of atoms/boxes (the name depends which documentation you read). The boxes can contain other boxes.

Each box starts with 8 byte “header” (including the beginning of the file). The first 4 bytes is the size of the box (big endian unsigned int), the second 4 bytes contains 4 character string name/type of the box. The size includes itself (so valid size is >= 0x0008 unless the special type of large box which I will conveniently omit in this post ;)). Basically MP4 container can be treated as some rudimentary file system.

For example:

00 00 00 1c  66 74 79 70

translates to box size of 28 bytes (0x1c) of type “ftyp”, the first box in the file (0x66=f 0x74=t 0x79=y 0x70=p in ASCII). As note: “ftyp” is basically “file type” description box.

00 01 68 7d  6d 6f 6f 76

translates to box size of 92285 bytes and of type “moov”, the box of special importance.

For purpose of extracting data I am interested in only following box types: “moov”, “gps ” and indirectly “free”.

The “moov” box is a special type, kind of index/metadata box (box of all boxes ;)). It contains video/audio/other data chunk mapping, other boxes and of a special importance, a non-standard “gps ” box.
This non-standard “gps ” box contains mapping of all GPS data boxes (will cover this later).

I assume that “moov” box should always be in top level (not a sub-box).

In my script I basically iterate through all top level boxes until I hit the “moov” box. Then I begin to iterate sub-boxes inside of the “moov” box, until I hit “gps ” box (this where the fun begins).

Note: Here is the reference I used to figure some of it out: http://l.web.umkc.edu/lizhu/teaching/2016sp.video-communication/ref/mp4.pdf and http://www.cmlab.csie.ntu.edu.tw/~cathyp/eBooks/14496_MPEG4/ISO_IEC_14496-14_2003-11-15.pdf

Novatek special “gps ” box
The “gps ” box is found inside of “moov” box.
The “gps ” box stores the file offset (in bytes) and size (in bytes) for each GPS data box.
The first 8 bytes in the “gps ” box contain version and encoded build date. I simply chose to ignore this data.
The subsequent 8 bytes contain 4 byte file offset (absolute) and 4 byte size (offset and size are big-endian unsigned ints).

00 2b e9 50  00 00 10 00

Translates to a GPS data box at position 0x002be950 (at 2877776 bytes) of size of 4096 bytes (0x1000).
Following the offset we find the GPS data box exactly where it supposed to be:

002be950  00 00 10 00 66 72 65 65  47 50 53 20 4c 00 00 00  |....freeGPS L...|

Novatek special “free” box beginning “GPS “
Not to be confused with “gps ” box ;).
This box sits in “free” box in midst of data chunks. The box can be identified with a magic “GPS ” string.
For some reason Novatek decided to store all the data in this box in little-endian format…

Here is the structure:

# Datetime data
hour: unsigned little-endian int (4 bytes)
minute: unsigned little-endian int (4 bytes)
second: unsigned little-endian int (4 bytes)
year: unsigned little-endian int (4 bytes)
month: unsigned little-endian int (4 bytes)
day: unsigned little-endian int (4 bytes)

# Coordinate data
active: string (1 byte) # satelite lock "A"=active, everything else (eg " ") lost reception
latitude hemisphere: string (1 byte) # "N"=North or "S"=South
longitude hemisphere: string (1 byte) # "E"=East or "W"=West
unknown: string (1 byte) # No idea, always "0"? 
latitude: little-endian float (4 bytes) # unusual format of DDDmm.mmmm D=degrees m=minutes
longitude: little-endian float (4 bytes) # unusual format of DDDmm.mmmm D=degrees m=minutes
speed: little-endian float (4 bytes) # Knots (the nautical kind)
bearing: little-endian float (4 bytes) # degrees, not used in GPX.

Disclaimer: this was a good hint in right direction: https://github.com/kbsriram/dcutils

Converting odd DDDmm.mmmm coordinate format to GPX compatible
Here is the algorithm:

def fix_coordinates(hemisphere,coordinate):
    # Novatek stores coordinates in odd DDDmm.mmmm format
    minutes = coordinate % 100.0
    degrees = coordinate - minutes
    coordinate = degrees / 100.0 + (minutes / 60.0)
    if hemisphere == 'S' or hemisphere == 'W':
        return -1*float(coordinate)
    else:
        return float(coordinate)

Converting knots to m/s

speed * float(0.514444)

Putting together GPX format

For GPX format to work one needs header similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.0"
        creator="Sergei's Novatek MP4 GPS parser"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.topografix.com/GPX/1/0"
        xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd">
        <name>../Videos/ubertec moron/2016_0716_235252_140.MP4</name>
        <url>sergei.nz</url>
</gpx>

Specifically it will not import without:

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/0"
xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd"

The track data is put together in following way:

<trk><name>test.gpx</name><trkseg>
                <trkpt lat="-36.863672" lon="174.765625"><time>2016-07-16T23:52:51Z</time><speed>23.484369</speed></trkpt>
                <trkpt lat="-36.863546" lon="174.765397"><time>2016-07-16T23:52:52Z</time><speed>23.479224</speed></trkpt>
</trkseg></trk>

Testing GPX file

I found an utility called xmllint that can be used to test GPX data (apt-get install libxml2-utils):

xmllint --noout --schema http://www.topografix.com/GPX/1/0/gpx.xsd test.gpx

8 thoughts on “Extracting GPS data from Viofo A119 and other Novatek powered cameras

  1. marauder

    Thanks for the very helpful post Sergei! You mention that “Amberalla’s embedding in the subtitle track (which is trivial to extract with open source tools).” I have an Amberalla device and on inspecting the mp4 file, I can see that there exists a subtitle track named “Amberalla EXT”, but am struggling to extract this track to get to the GPS and g-sensor info. Can you suggest the tools which can extract these?

    Reply
  2. Cip

    Hi,
    My dashcam is Viofo A119, with latest firmware (2017-something).
    I think there is a problem about GPS / satelite reception part of script (I get lots of Skipping: lost GPS satelite reception errors).
    The output file is just :

    <?xml version="1.0" encoding="UTF-8"?>
    <gpx version="1.0"
    	creator="Sergei's Novatek MP4 GPS parser"
    	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    	xmlns="http://www.topografix.com/GPX/1/0"
    	xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd">
    	<name>mydata</name>
    	<url>sergei.nz</url>
    	<trk><name>mydata</name><trkseg>
    	</trkseg></trk>
    </gpx>
    

    NovaTrakt software works ok, but I’d rather use your script 🙂

    Reply
    1. iamroot Post author

      They probably changed the data format again, got a sample I can use to reverse engineer?
      Otherwise I will update one of the cameras I have and test with that.

      Sergei.

      Reply
      1. Cip

        Thank you. I think I found the solution (I dabble a little in python).
        I changed:

        hour,minute,second,year,month,day,active,latitude_b,longitude_b,unknown2,latitude,longitude,speed = struct.unpack_from('<IIIIIIssssfff',data, 16)
        

        to:

        hour,minute,second,year,month,day,active,latitude_b,longitude_b,unknown2,latitude,longitude,speed = struct.unpack_from('<IIIIIIssssfff',data, 48)
        

        I remembered about a discussion here: https://dashcamtalk.com/forum/threads/script-to-extract-gps-data-from-novatek-mp4.20808/page-4#post-303480 and tried my luck 🙂
        My firmware is v2.06 (Viofo A119).

        I uploaded a sample here: https://we.tl/aycFrt5RXL

        Reply
        1. iamroot Post author

          I thought I updated it for new format….

          I did it now :).

          I moved old version to nvtk_mp42gpx_older.py.

          Thanks for letting me know :), it should work now (I tested the updated version with v2.0.x).

          Reply
  3. Cip

    I really think you did update it: from the forum thread it results the initial offset is 48, then you changed it to 16 (as suggested by user nuxator) and now is back to 48.

    The script is perfect for my needs – thank you again.

    I modified the check_in_file function, because os.listdir returns the list of directory entries in arbitrary order, hence the gpx entries are not ordered chronologically. To solve this, I sorted the output of check_in_file.

    Reply
    1. beachBoy

      for the ones not python pros, the code is “sorted(os.listdir(f1)” intead of just “os.listdir(f1)”

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *