Answering my own question which StackOverflow thinks is not a question.
Can a file change without its MFT record also changing?
Clarifications:
- Broadly files:
Directory counts as a file. Any change on the volume is a change to some "file". - Arbitrary composition:
Any compounded set of changes could happen between the points where we do comparison. - Eventual consistency:
If a change is delayed but will eventually get written out, this is fine. We only care about missing the change completely. - Brought this upon yourself:
Editing raw volume bypassing NTFS doesn't count.
Obscure IOCTLs, FSCTLs and settings count only insomuch as they are realistic in normal use.
We care about scenarios like "logging is disabled on the volume" or "backup tool rolls dates back with SetFileTime", but not ones where you have deliberately tried to circumvent NTFS.
The answer
For all intents and purposes: No, so long as you handle a limited bunch of exceptions:
- All segments 0-35, and maybe up to 0-63.
- All subcontent in $Extend, System Volume Information, Boot.
- If a file has multiple segments they have to be treated as one. Any segment is dirty => all clusters mentioned in all of them are dirty.
- Non-resident $INDEX_ALLOCATIONs's DUPLICATED_INFORMATION may change without the directory's MFT changing. Maybe consider all of these dirty.
Special files
Segments 0-15 are NTFS system files: $Bitmap, $LogFile, $BadClus etc. NTFS driver skips normal accounting with them. Do not rely on the MFT changing.
Segments 16+ are less fixed, but 16-35 are still commonly reserved/used for system files and also get driver magic.
There's a bunch of other files the NTFS and related drivers access directly, the ones I have encountered are all in $Extend, System Volume Information and Boot (on the boot volume; do not confuse with $Boot). The latter two can have arbitrary segment numbers, so if you want to be thorough you have to find them by parsing the directory tree.
Multi-segment files
Files can span multiple segments. The base segment will have $ATTRIBUTE_LIST, and the secondary ones will have its segment number in their BaseSegmentNumber fields. Changes to non-resident data in any of these segments may be reflected in other segments, so all segments have to be treated as one.
Creation/deletion
Creation: At least one MFT segment is associated with the new file and its IN_USE flag is set (a change).
Deletion: MFT segment loses its IN_USE flag (a change).
Reuse: On deletion, the MFT segment update counter is incremented by one. If a new file is stored in that place, even one identical in all other respects, its update counter will be different.
Segment addition/removal: If a segment is added to a multi-segment file, this sets its IN_USE flag. If it's detached, this increments the update counter and clears IN_USE flag for the segment itself. Either case changes the contents of $ATTRIBUTE_LIST in one of the remaining segments.
Simple attributes
Changes to the resident attributes ARE changes to the MFT segment and so are automatically covered. Some attributes are always resident so changes to those are always MFT-only.
Non-resident attributes are stored as a list of pairs "first cluster + length". If, due to size changes, clusters have to be allocated or released, this changes the MFT. But ever before that, data size in bytes for each non-resident attribute is stored in the MFT, so if data size changes, MFT changes.
The only remaining complex case is "non-resident attribute changes while maintaining the data size". This is a surprisingly common case. All sorts of databases are updated in this way, by writing to particular positions without changing the data size.
Non-resident attributes with no change to the data size
Here are the things that will change in the MFT entry:
- LastModificationTime.
- Changes every time you make changes to the file. Cannot easily be disabled, but you can roll it back with SetFileTime and various copy/restoration tools do just that.
- LastChangeTime.
- Reflects any changes to any attributes (data AND MFT), even if you later roll the change back. Cannot easily be disabled, and cannot itself be rolled back with SetFileTime, but I suppose there could be FSCTLs or IOCTLs. Borders on "brought upon yourself".
- LSN (Log Sequence Number).
- A pointer into $LogFile. Incremented every time there's a change to the MFT, even if you later roll it back. The $LogFile itself is circular but the sequence number includes a wrap counter: every time $LogFile wraps, the counter is incremented. The LogFile cannot be disabled.
- USN (Update Sequence Number).
- A pointer into $UsnJrnl. Tracks higher-level changes to files. The journal itself can be disabled.
- Cached change times in $FILE_NAME.
- No guarantees but also harder to roll back.
- MFT segment fixups.
- Increments every time this segment or its neighbors are writen. Wraps after 65536 so only *somewhat* reliable long-term. Not very useful as hits to one segment catch too many of the neighboring ones so usually when you're trying to detect changes you want to ignore fixup changes. Basically it's a worse version of LSN.
In conclusion:
Any change to the data triggers LastModificationTime. Any changes to anything (including the data AND LastModificationTime) triggers LastChangeTime. Any change to the MFT (including LastModificationTime and LastChangeTime) triggers LSN increase which is monotonic, non-disableable and cannot be rolled back.
To prevent this you'll have to disable BOTH LastModificationTime and LastChangeTime and prevent changes to $FILE_NAME. There does not seem to be a way to do either. Any of these trigger LSN which is then permanently different.
Delayed updates
NTFS delays some updates for, as some sources say, up to hours. This is fine, so long as those cumulative updates will eventually get written out.
Sparse files
Sparse files have some of the start:length pairs in the non-resident attribute run list marked as skips (not mapped to any real clusters). Changes to the total length or to the real parts are covered by the normal logic above.
Clusters turning from sparse to real and back will have to be reflected in the attribute run list (a change).
Directories
Directory contents is stored in its $INDEX_ROOT and $INDEX_ALLOCATION attributes, changes to which normally trigger the usual change logic (LastChangeTime -> LSN).
Exception: Cached file times in index file entries's DUPLICATED_INFORMATION may get updated at any time, and if that's non-resident, the driver does not consider this "a change to the directory" and does not do any MFT changes. If you want to catch these, consider all non-resident $INDEX_ALLOCATIONs dirty.
So far I have only seen this happen to DUPLICATED_INFORMATION. This is basically an unreliable cached information about the target file. It is reasonable that a change to the target file details is not considered a change to the directory. If this behavior is limited to this case, then maybe these can be ignored. I have not thoroughly verified that a change to important properties such as addition or deletion of a file will get reflected in the MFT, though I would expect it so.
File name changes and hard links
For each hard link, a file receives another $FILE_NAME attribute or two (if short names are enabled). This changes both the directory (its index) and the file. Renaming the file changes its name in the MFT.
How to test this
Get a segment dumper/printer. I'm using my own which I plan to share eventually.
1. Create some file and fill it with 4096 bytes, so that it does not fit in the MFT.
2. fsutil file queryfileid test.txt
3. Dump its segment.
4. Use a script which reads file times, changes the file and writes file times back.
5. Dump the segment again.
6. Compare.