There are two separate questions in this thread: (1) the de-duplication process, and (2) recognizing files for payment.
Files less than 1KB are currently stored in the DataMap itself (so with the directory metadata), and as such are not de-duplicated. Files larger than 1KB are split into a minimum of 3 chunks, where each chunk has a maximum size of 1MB. The encryption keys for each chunk are the unencrypted hashes of the prior two chunks (with modulus).
If a single bit was flipped on a file larger than 1KB, at minimum 3 chunks are changing due to the encryption process. So in a 100GB, if one bit was changed, 3MB of data are definitely uploaded, and the rest may be uploaded (when uploading occurs is whole other discussion). The user will only need enough SafeCoin for storing 3MB of data.
However, as @fergish mentioned, if bit(s) were removed or added to the first chunk, then it will likely change all of the chunks. This is not guaranteed though, because patterns in data could leave some chunks identical.
As for payment for accessing files, I’m not sure about those details, and whether they’ve even been completely finalized. Its likely this will happen at the chunk level because there are no references to files on the network (no i-nodes) - which is yet another long discussion. In other words, the algorithm would likely be “anytime these chunks are requested, notify X”. I think this is still open for discussion; I can’t think of where its implemented currently.