w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
How can I merge two csv files by a common column, in the case of unequal rows?

Read data from the shorter file into memory, into a dictionary keyed on the LOGRECNO row:

import csv

with open('sample_state_census.csv', 'rb') as census_file:
    reader = csv.reader(census_file, delimiter='	')
    census_header = next(reader, None)  # store header
    census = {row[9]: row for row in reader}

then use this dictionary to match against the geo data, write out matches:

with open('sample_state_geo.csv', 'rb') as geo_file:
    with open('outputfile.csv', 'wd') as outfile:
        reader = csv.reader(geo_file, delimiter='	')
        geo_header = next(reader, None)  # grab header
        geo_header.pop(6) # no need to list LOGRECNO header twice

        writer = csv.writer(outfile, delimiter='	')
        writer.writerow(census_header + geo_header)

        for row in reader:
            if row[6] not in census:
                # no census data for this LOGRECNO entry
                continue
            # new row is all of the census data plus all of geo minus
column 7
            newrow = census[row[6]] + row[:6] + row[7:]
            writer.writerow(newrow)

This all assumes the census file is not so big as to take up too much memory. If that's the case you'll have to use a database instead (read all data into a SQLite database, match in the same vein agains the geo data).





© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.