l******e 发帖数: 94 | 1 Phone screen question:
You have a billion urls , where each has a huge page, how to detect the
duplicate documents?
I said hashing the document contents, so the interviewer asker do I know
which hash function should I used? I have no clue about what specific
function can hash a large file into a small key that takes relatively less
space.
Anybody give can give me some hint? | D*******a 发帖数: 3688 | 2 you can use any hash functions, e. g. sum all characters mod 2^32-1
less
【在 l******e 的大作中提到】 : Phone screen question: : You have a billion urls , where each has a huge page, how to detect the : duplicate documents? : I said hashing the document contents, so the interviewer asker do I know : which hash function should I used? I have no clue about what specific : function can hash a large file into a small key that takes relatively less : space. : Anybody give can give me some hint?
|
|