w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
Seeing if one line from file is a duplicate in another file Python

You can't just do if line in seen: to search the whole seen file for the given line. Even if you could, it would only search the rest of the file, and since you're at the end of the file, that would mean you're searching over nothing. And, even if you solved that problem, it would still require doing a linear search over the whole file for each line, which would be very slow.

The simplest thing to do is to keep track of all the lines seen, e.g., with a set:

with open('filetwo.txt') as f:
    seen = set(f)

with open('fileone.txt') as fin, open('filetwo.txt', 'a+') as fout:
    for line in fin:
        if line in seen:
            print(line + 'is a duplicate')
        else:
            fout.write(line)
            seen.add(line)

Notice that I'm pre-filling seen with all of the lines in filetwo.txt before we start, and then adding each new line to seen as we go along. That avoids having to re-read filetwo.txt over and over again—we know what we're writing to it, so just remember it.





© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.