Datimprint Overview
Jordial's Datimprint program helps you verify that all of your data is intact—every single bit—by generating snapshots at single points in time and later comparing them with current data. Datimprint produces files containing “fingerprints” of each file. These “data imprints” take up just a tiny fraction of the size of the data itself, but practically guarantee that any subsequent changes to the data can be detected.
In addition to per-file fingerprints, Datimprint generates a fingerprint for each directory. The directory fingerprint takes into consideration the fingerprints of all children, each of which may itself be a directory with children. Thus if the fingerprints of two directories match, you can be assured that all the names, timestamps, and contents of each entire subtree matches the other.
Datimprint CLI
The Datimprint Command-Line Interface (CLI) is the program that performs all datimprint functionality. Datimprint CLI requires Java to be installed on your machine, with the correct JAVA_HOME
environment variable set. You must download the latest Datimprint CLI version from Maven: select the download icon next to the latest version, and then choose either bin.tar.xz or bin.zip. The archive will contain a self-contained executable script for Linux, an executable file for Windows, and an executable JAR for all other Java systems.
Read the complete installation instructions to ensure Datimprint CLI is installed correctly on your system. You can check which version you have installed using the --version
switch:
datimprint --version
If you forget the available commands or can't remember which options are used, use the --help
switch, which can be used alone or with a command:
datimprint --help
datimprint generate --help
datimprint check --help
Generate Data Imprint
The first step in verifying your data is generating an imprint, which will be stored in a datim file. This is done using the Datimprint generate
command. Include the path to the directory tree containing files and directories to include in the imprint. Finally specify where you would like to store the imprint by using the --output
or -o
option.
Example
- You have data stored in the
C:\data
directory tree. You are getting ready to make a backup ofC:\data
, and you want to be able to subsequently verify that the backup is correct. - You would like to periodically create snapshot imprints of your data directory
C:\data
and store thost snapshots in theC:\imprints
directory. - You are worried that your data stored in
C:\data
might be deteriorating over time, perhaps because the drive uses flash memory or you have been seeing read errors. You decide to make an imprint of the data as it is now so that you can check a few months later that the data has not changed.
For all these scenarios, you would run the following command, substituting your preferred imprint filename for data-2022-11-12.datim
:
datimprint generate C:\data --output C:\imprints\data-2022-11-12.datim
Check Data Imprint
After you have generated an imprint and stored it in a datim file, you can keep the imprint as long as you like for verifying data. You can check the current contents of any data tree against a datim file—the root path doesn't matter, as long as the relative filenames match. This allows you to check not only the original data against an imprint, but also a backup of the data stored in a completely separate tree.
Checking data is performed using the check
command. Include the path to the directory tree containing the files and directories to check. Specify which imprint you would like to check the data against using the --imprint
or -i
option.
Examples
Earlier you created an imprint of C:\data
directory tree and stored it in C:\imprints\data-2022-11-12.datim
. Then you backed up your data to the C:\backup\data
directory. You want to be confident that the data in C:\backup\data
is identical to your data in C:\data
at the time the imprint was generated. You would issue the following command:
datimprint check C:\backup\data --imprint C:\imprints\data-2022-11-12.datim
Several months later you want to make sure that the original data in C:\data
has not deteriorated. At any time you can check the current data against the imprint to see if anything has changed.
datimprint check C:\data --imprint C:\imprints\data-2022-11-12.datim
Datim Files
The “data imprint” file usually has a .datim
extension and contains recursive fingerprints of a directory subtree. It has several features:
- Ability to archive imprints of multiple subtrees in a single file.
- An overall fingerprint of each tree.
- A separate fingerprint of the contents of each file or directory tree.
- A record of the modification timestamp of each path.
Datim File Format
A datim file contains tab-separated values, with the first row containing headers that identify each column. Conceptually the file contains values as in the following example table.
# | miniprint | path | content-modifiedAt | content-fingerprint | fingerprint |
---|---|---|---|---|---|
/ |
| C:\data |
|
|
|
1 | 1abc1abc | C:\data\foo\bar.txt | 2015-02-04T06:08:12.9876543Z | 1a1a1a1a1a1a… | 1abc1abc1abc… |
2 | 2bcd2bcd | C:\data\foo | 2017-04-03T12:34:56.3456789Z | 2b2b2b2b2b2b… | 2bcd2bcd2bcd… |
… | |||||
678 | 3cde3cde | C:\data | 2019-12-15T04:08:12.8765432Z | 3c3c3c3c3c3c… | 3cde3cde3cde… |
#
- Contains the running number of path imprints, or
/
to indicate the start of a new root for subsequent path imprints. miniprint
- A shortened form of the
fingerprint
for quick visual scanning. path
- The path the imprint is for.
content-modifiedAt
- The timestamp indicating when the path was last modified.
content-fingerprint
- The fingerprint of only the file contents. For directories this includes the contents of all child files and directories, recursively.
fingerprint
- The full fingerprint of the path, including contents, filename (case sensitive), and modification timestamp. For directories this includes the fingerprints of all child files and directories, recursively.