The script.
nvtk_mp42gpx.py
Here it is: nvtk_mp42gpx.py
Alternative version: nvtk_mp42gpx_older.py
What does it do?
This script will attempt to extract GPS data from Novatek MP4 file and output it in GPX format. Usage: ./nvtk_mp42gpx.py -i<inputfile> -o<outfile> [-f] -i input file (will quit if does not exist) -o output file (will quit if exists unless overriden) -f force (optional, will overwrite output file)
In short: it takes Novatek encoded MP4 file (with embedded GPS data) and extract GPS data in GPX format (as separate file). Note; it does not modify the original MP4 file.
In long:
What the? Where is the bloody GPS data?
Unlike competitors (Ambarella and such) Novatek actually embeds the GPS data in MP4, specifically in free atoms/boxes in midst of the stream chunks. This is a bit different than Amberalla’s embedding in the subtitle track (which is trivial to extract with open source tools).
The search for documentation for the Novatek data structure was a scavenger hunt on itself.
Although writing the MP4 rudimentary parser took longest time, figuring out Novatek data structure was more complicated due to lack of information.
MP4 container basics
Disclaimer: I am no expert or even at enthusiast level regarding the video containers, thus information below is not guaranteed to be correct. These are simply my findings and should be “taken with a grain of salt”.
In very simple terms the MP4 container consists of atoms/boxes (the name depends which documentation you read). The boxes can contain other boxes.
Each box starts with 8 byte “header” (including the beginning of the file). The first 4 bytes is the size of the box (big endian unsigned int), the second 4 bytes contains 4 character string name/type of the box. The size includes itself (so valid size is >= 0x0008 unless the special type of large box which I will conveniently omit in this post ;)). Basically MP4 container can be treated as some rudimentary file system.
For example:
00 00 00 1c 66 74 79 70
translates to box size of 28 bytes (0x1c) of type “ftyp”, the first box in the file (0x66=f 0x74=t 0x79=y 0x70=p in ASCII). As note: “ftyp” is basically “file type” description box.
00 01 68 7d 6d 6f 6f 76
translates to box size of 92285 bytes and of type “moov”, the box of special importance.
For purpose of extracting data I am interested in only following box types: “moov”, “gps ” and indirectly “free”.
The “moov” box is a special type, kind of index/metadata box (box of all boxes ;)). It contains video/audio/other data chunk mapping, other boxes and of a special importance, a non-standard “gps ” box.
This non-standard “gps ” box contains mapping of all GPS data boxes (will cover this later).
I assume that “moov” box should always be in top level (not a sub-box).
In my script I basically iterate through all top level boxes until I hit the “moov” box. Then I begin to iterate sub-boxes inside of the “moov” box, until I hit “gps ” box (this where the fun begins).
Note: Here is the reference I used to figure some of it out: http://l.web.umkc.edu/lizhu/teaching/2016sp.video-communication/ref/mp4.pdf and http://www.cmlab.csie.ntu.edu.tw/~cathyp/eBooks/14496_MPEG4/ISO_IEC_14496-14_2003-11-15.pdf
Novatek special “gps ” box
The “gps ” box is found inside of “moov” box.
The “gps ” box stores the file offset (in bytes) and size (in bytes) for each GPS data box.
The first 8 bytes in the “gps ” box contain version and encoded build date. I simply chose to ignore this data.
The subsequent 8 bytes contain 4 byte file offset (absolute) and 4 byte size (offset and size are big-endian unsigned ints).
00 2b e9 50 00 00 10 00
Translates to a GPS data box at position 0x002be950 (at 2877776 bytes) of size of 4096 bytes (0x1000).
Following the offset we find the GPS data box exactly where it supposed to be:
002be950 00 00 10 00 66 72 65 65 47 50 53 20 4c 00 00 00 |....freeGPS L...|
Novatek special “free” box beginning “GPS “
Not to be confused with “gps ” box ;).
This box sits in “free” box in midst of data chunks. The box can be identified with a magic “GPS ” string.
For some reason Novatek decided to store all the data in this box in little-endian format…
Here is the structure:
# Datetime data hour: unsigned little-endian int (4 bytes) minute: unsigned little-endian int (4 bytes) second: unsigned little-endian int (4 bytes) year: unsigned little-endian int (4 bytes) month: unsigned little-endian int (4 bytes) day: unsigned little-endian int (4 bytes) # Coordinate data active: string (1 byte) # satelite lock "A"=active, everything else (eg " ") lost reception latitude hemisphere: string (1 byte) # "N"=North or "S"=South longitude hemisphere: string (1 byte) # "E"=East or "W"=West unknown: string (1 byte) # No idea, always "0"? latitude: little-endian float (4 bytes) # unusual format of DDDmm.mmmm D=degrees m=minutes longitude: little-endian float (4 bytes) # unusual format of DDDmm.mmmm D=degrees m=minutes speed: little-endian float (4 bytes) # Knots (the nautical kind) bearing: little-endian float (4 bytes) # degrees, not used in GPX.
Disclaimer: this was a good hint in right direction: https://github.com/kbsriram/dcutils
Converting odd DDDmm.mmmm coordinate format to GPX compatible
Here is the algorithm:
def fix_coordinates(hemisphere,coordinate): # Novatek stores coordinates in odd DDDmm.mmmm format minutes = coordinate % 100.0 degrees = coordinate - minutes coordinate = degrees / 100.0 + (minutes / 60.0) if hemisphere == 'S' or hemisphere == 'W': return -1*float(coordinate) else: return float(coordinate)
Converting knots to m/s
speed * float(0.514444)
Putting together GPX format
For GPX format to work one needs header similar to this:
<?xml version="1.0" encoding="UTF-8"?> <gpx version="1.0" creator="Sergei's Novatek MP4 GPS parser" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.topografix.com/GPX/1/0" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd"> <name>2016_0716_235252_140.MP4</name> <url>sergei.nz</url> </gpx>
Specifically it will not import without:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.topografix.com/GPX/1/0" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd"
The track data is put together in following way:
<trk><name>test.gpx</name><trkseg> <trkpt lat="-36.863672" lon="174.765625"><time>2016-07-16T23:52:51Z</time><speed>23.484369</speed></trkpt> <trkpt lat="-36.863546" lon="174.765397"><time>2016-07-16T23:52:52Z</time><speed>23.479224</speed></trkpt> </trkseg></trk>
Testing GPX file
I found an utility called xmllint that can be used to test GPX data (apt-get install libxml2-utils):
xmllint --noout --schema http://www.topografix.com/GPX/1/0/gpx.xsd test.gpx
Thanks for the very helpful post Sergei! You mention that “Amberallaβs embedding in the subtitle track (which is trivial to extract with open source tools).” I have an Amberalla device and on inspecting the mp4 file, I can see that there exists a subtitle track named “Amberalla EXT”, but am struggling to extract this track to get to the GPS and g-sensor info. Can you suggest the tools which can extract these?
Have you tried ffmpeg?
Hi,
My dashcam is Viofo A119, with latest firmware (2017-something).
I think there is a problem about GPS / satelite reception part of script (I get lots of Skipping: lost GPS satelite reception errors).
The output file is just :
NovaTrakt software works ok, but I’d rather use your script π
They probably changed the data format again, got a sample I can use to reverse engineer?
Otherwise I will update one of the cameras I have and test with that.
Sergei.
Thank you. I think I found the solution (I dabble a little in python).
I changed:
to:
I remembered about a discussion here: https://dashcamtalk.com/forum/threads/script-to-extract-gps-data-from-novatek-mp4.20808/page-4#post-303480 and tried my luck π
My firmware is v2.06 (Viofo A119).
I uploaded a sample here: https://we.tl/aycFrt5RXL
I thought I updated it for new format….
I did it now :).
I moved old version to nvtk_mp42gpx_older.py.
Thanks for letting me know :), it should work now (I tested the updated version with v2.0.x).
I really think you did update it: from the forum thread it results the initial offset is 48, then you changed it to 16 (as suggested by user nuxator) and now is back to 48.
The script is perfect for my needs – thank you again.
I modified the check_in_file function, because os.listdir returns the list of directory entries in arbitrary order, hence the gpx entries are not ordered chronologically. To solve this, I sorted the output of check_in_file.
for the ones not python pros, the code is “sorted(os.listdir(f1)” intead of just “os.listdir(f1)”
Thanks!
Sergei is the python code available. I wanted to see if i could use it to extract the GPS data from a Gitup Git3 camera
Thanks
Mark
Hey, I fixed the links:
Here it is: nvtk_mp42gpx.py
Alternative version: nvtk_mp42gpx_older.py
Thanks, code works with Git3 mp4 file.
You are welcome :).
Hi, I installed Python 2.7 on Windows 7,
I tried both scripts but it results just in a 4KB .gpx file. Although I get
“Wiriting data to output file ‘$005.GPX
Success!”
Any idea what could went wrong. I also tried it with Solus OS (Linux) its the same problem there too.
Using Viofo A119 with Firmware 2.02
The Video files are recorded in 1440p 30fps
No such problems with NovaTrakt 3.06, but that one do not accept files >2GB
Is it possible to get a sample file?
There is possibility format changed (again)…
Hi, I send you a sample file via contact form.
The firmware I use is old, A119_170221_V2.02 maybe too old?
I just tested your sample with this script and it worked fine. From GPS coordinates your location is in a German city near a swimming pool.
The resulting file is 16KB and has 158 lines (147 of which are GPS points).
Hi,
I found the problem, its a PEBCAK π I do not have a lot of experience with scripts/command line.
I had no luck with Windows, tried almost everything. Finally I could manage to get my .gpx with Solus OS (Linux Distribution)
To create the correct .gpx I had to drop down the file with its full path
this way it works. And also tells me how many data points were found, which it does not if the files are in home folder.
ikarus@ikarus ~ $ cd Videos/
ikarus@ikarus ~/Videos $ ‘/home/ikarus/Videos/nvtk_mp42gpx.py’ -i $’/home/ikarus/Videos/2018_0415_113127_004.MP4′ -o $004testvidfolder.GPX
Queueing file ‘/home/ikarus/Videos/2018_0415_113127_004.MP4’ for processing…
Processing file ‘/home/ikarus/Videos/2018_0415_113127_004.MP4’…
Found moov atom…
Found gps chunk descriptor atom…
Found 147 GPS data points…
Wiriting data to output file ‘bash04testvidfolder.GPX’…
Success!
—–
or alternative
ikarus@ikarus ~/Videos $ ./nvtk_mp42gpx.py -i $’2018_0415_114923_007.MP4′ -o $’finally.GPX’
Queueing file ‘2018_0415_114923_007.MP4’ for processing…
Processing file ‘2018_0415_114923_007.MP4’…
Found moov atom…
Found gps chunk descriptor atom…
Found 600 GPS data points…
Wiriting data to output file ‘finally.GPX’…
Success!
—-
Thanks you very much for this script, finally I could extract all gps tracks even from the >2GB large ones.
Kind regards,
KD
This script is a very good fit for my needs (creating a set of geo referenced images from a dash cam video for use in OpenStreetMap editing). But it would be a little better if the bearing/course the car was moving on was also recorded in the GPX file. So I changed the script as follows:
$ diff nvtk_mp42gpx.1.py nvtk_mp42gpx.py
98c98
< hour,minute,second,year,month,day,active,latitude_b,longitude_b,unknown2,latitude,longitude,speed = struct.unpack_from(' hour,minute,second,year,month,day,active,latitude_b,longitude_b,unknown2,latitude,longitude,speed,bearing = struct.unpack_from(‘<IIIIIIssssffff',data, 48)
113c113
return (latitude,longitude,time,speed,bearing)
127c127
< gpx += "\t\t%s%f\n” % l
—
> gpx += “\t\t%s%f%f\n” % l
Thanks!
Updated the file as per your diff.
Sergei.
Hi Sergei,
I bought A119S recently and I noticed there is no GPS player for Mac and I tried to made a single file javascript implementation of your script. Thanks a lot.
https://www.hiska.net/js-extract-gps-data-from-novatek-mp4
Awesome!
Thanks, I am still updating that, such as, adding Google Map instead of saving GSX file, auto-play to next file…
Hi Sergei,
I’m having trouble with the latitude and longitude part. Everything else matches my data. But the Latitude field in my data – if interpreted as DDDmm.mmmm has a value greater than 59 in the “minutes” part. My understanding is that there are 60 minutes in a degree, so this should be impossible. Even accounting for an overflow this yields a latitude that is way off (by a factor of ~9) and any way I look at it it doesn’t seem close to the actual value. Longitude doesn’t have this problem but also yields a value that is way off (by a factor of ~5).
Have you seen this by any chance?
Let me know the coordinates and I will try to decipher.
Do you have software supplied with camera that interprets the GPS coordinates correctly? I could try reverse engineer that…
Specially have registered to say big thanks for your script! π
This script is a fine acceptable for my Viofo A129 Duo and now I can make one gpx track from all files from my memory card just in one command!
Python is good choice for that and worked on all known OS.
Want to wish you luck in such and other things! π
Thank you!
Hi Sergei. I’m running the latest developer firmware version T2.0 25.05.2020 -> https://www.viofo.com/community/index.php?threads/true-hdr-will-be-supported-on-v3.26921/post-39173
Seems the format change. I can’t extract gps data. Just receiving lines like:
Skipping: lost GPS satelite reception. Time: 2000-00-00T1148917980:1036831949:1131234591Z.
Skipping: lost GPS satelite reception. Time: 2000-00-00T1148917983:1022739087:1131234591Z.
Found 180 GPS data points…
Wiriting data to output file ‘20200626131614_000011.MP4.gpx’…
Success!
Queueing file ‘20200626131913_000012.MP4’ for processing…
Processing file ‘20200626131913_000012.MP4’…
Found moov atom…
Found gps chunk descriptor atom…
Skipping: lost GPS satelite reception. Time: 2000-00-00T1148917988:1008981770:1131234591Z.
Skipping: lost GPS satelite reception. Time: 2000-00-00T1148917989:1008981770:1131234591Z.
Skipping: lost GPS satelite reception. Time: 2000-00-00T1148917989:1017370378:1131234591Z.
Would be great if you can take a look into this issue.
If you need a video made with the version 2.0, just let me know.
A video sample would be great to trouble shoot it…
Dear Sergei,
I have rexing v3, it require deobfuscate option, but initially it was not working because global scope/local scope for that variable, please add “global deobfuscate” in the start of get_args function to fix this, thanks a lot for the script and info
Thanks, fixed.
I’m impressed by your work and kindness.
Unfortunately the code did not work for V3 of the Viofo 119.
Using ‘exiftool’ version 12.08 (!; be sure to use this version as the default Ubuntu version did not work).
I was able to extract GPS info. There is also a Python wrapper (https://smarnach.github.io/pyexiftool/)
I am sorry to hear that, I would like to know why it didn’t work (what was the output?).
Very clever, but seems to fail on A129 Pro video files from fw v2.1
It skips all blocks because no data was found in the expected format.
They seem to keep changing the data format. I don’t have access to A129 Pro, but I can fix it if I get a sample…
The old script works on it!
Seems like it kept getting tripped up in things not checked in old script. Graphs look good…
Hi Sergei,
I have just sent you a message in regards to VIOFO A119 V3 .TS files support, hopefully you could help. Thank you!
Hi Sergey,
It looks like the TS format is completely different to the MP4/MOV formats in regards that it does not have an index table to look up the boxes/atoms with GPS data, so the approach would be more of brute force search type (unless I find the way to parse the TS format smartly).
Slightly off-topic but I suspect they switched to the TS format to deal with file corruption in case of abrupt loss of power or camera crash (the MP4/MOV will be fully corrupt, while TS will be fine).
In anyway this will require a bit more time (as I need to study the TS format and come up with an efficient algorithm as there no easy way to identify boxes).
I have updated the script with the TS support, and fixed the V3 bug (AFAIK).
https://sergei.nz/nvtk_mp42gpx-py-revisited-now-with-ts-support/
Hello!
In my case with processing whole dir your main loop result (for input_files in…) was unsorted, so files processed as it writed to disk and result gpx file containes points in wrong order.
So, I added sorted(in_files) in your main loop, it is enough for me.
Please add sorting in your script to avoid this problems.
With the best regards!
Hello,
It seems that your glob expansion does not output in an alphabetical sorted (my bash env expands * in a sorted fashion) hence you are having this issue.
Nevertheless the sort feature is crucial hence I have added it:
Now by default the coordinates will be sorted by the GPS date (this could be overridden by the -s [f|n] flag).
In python doc about glob.glob() I see that
“Whether or not the results are sorted depends on the file system.”
So, on my fedora this returns unsordet list as a fact )
It is great that you so fast add correctives in your script!
Now I have checked default sorting and all is ok!
Thank you very much again for the work done!
Hello!
Could you make the following functionality in your utility: ignore points with erroneous coordinates (very different from their neighbors).
My GPS sometimes sins with this and it turns out that in a second I have traveled thousands of kilometers and back :))
For example:
2021-06-06T11:27:45Z9.14167097.760002
2021-06-06T11:27:46Z9.79501497.320000
2021-06-06T11:27:47Z10.88049098.400002
If, of course, you want to implement it even in a very simplified form, that would be great!
I will see what I can do.
Hi,
I have added `-e` flag that will remove the outliers based on the speed calculation to the median coordinate from a given file.
I sent the threshold to 1000m/s for the ones that might use the dashcams in their light aircraft.
I hope that works.
Big thanks!
May be, I will test it somewhere in aircraft π
But I can not see this -e option in downloaded script, may be you forgot to upload one.
It is there π :
Now I can see it π
Thanks, I will try to not drive with speed over 1mah )))
This is work fine and many wrong coordinates was removed.
But there is some bug in this code due to one more special case: empty coordinates:) So, in the input of remove_outliers() function may be non-empty gps_data array, but with first empty (None) elements (e.g. about of 30 items with None, and other last items is ok)
This is generates the error:
line 499, in remove_outliers
lats.append(item[‘Loc’][‘Lat’][‘Float’])
TypeError: ‘NoneType’ object is not subscriptable
I guess this is because of in my case coordinates not only with errors but just empty! π
Can you with wrong coordinates also remove an empty ones?
I added an exception handling to that particular case.
Please try again.
Also please tell me what make/model of the camera you have so we all can avoid it ;).
I can sen an example to you, if you want, but I understood: it is an empty coord when gps not initialized yet (when engine just started)
Sorry, the site cut my gpx example, I’ll upload it here:
https://drive.google.com/file/d/1MALEZr8RRx3p5mR_KSyoB7J6kv498A7D/view?usp=sharing
I added an exception handling to that particular case.
Please try again.
Also please tell me what make/model of the camera you have so we all can avoid it ;).
—
I can not see reply button in your message, may be, branch level is too deep ) Replying here.
Unfortunately, error now appears in another string), the root of problem, may be, a bit deeper. As i can see, error appears where you refer to item[‘Loc’][‘Lat’][‘Float’].
Here is an example of it video with errors, you may test program with it.
[link redacted]
May be, its better to ignore such wrong coordinates at the earlier level.
Model of my cam is Viofo A129 Duo.
Hello,
I have sent you an email.
The Viofo A129 Duo is a good camera. Not sure why your camera is acting up (maybe faulty camera or jamming in the area).
I have put a fix for Null coordinates (I found that your GPS payload sometimes does not have expected data, I already account for inactive GPS, this is different).
Just added a Viofo A119 v3 to my new car (moved the old Viofo A119s v2 to my old car) and found my scripts, layered on top of your excellent nvtk_mp42gpx, were broken because the GPS information was not being properly extracted.
I looked around for a fix and found that you have already dealt with this. Your new version works great! Thank you so much for your work!
Thank you so much for this brilliant piece of code. You’ve iterally saved me hours of work!
If you are present on github it would be a great idea to upload this code so more people can take advantage of your hard work.
Thank you!