Deduplication savings are dependent on the ratio of unique data blocks to redundant data blocks in a data set.
Storage Space Consumed = Total Number of Blocks * Block Size
Storage Space Consumed = (Number of Unique Blocks * Block Size) + (Number of Redundant Blocks * Dedup Overhead)
To maximize deduplication savings you need two things: one, very few unique blocks; two, the dedup overhead per redundant block must be significantly smaller than block size.
Block size and the probability of finding a redundant block are inversely proportional: As block sizes get smaller, the probability of a block being redundant get higher. However, as block sizes get smaller the storage savings per redundant block also get smaller. Block size must be carefully considered as it plays a large role in determining storage efficiency (and transfer time) - if a block size is too large or too small compared to the average size of a file, storage efficiency will go down.
In general, multimedia content is likely to observe minimal storage savings from deduplication while traditional content (text and documents) is likely to observe large savings.