Parse e-mail to automate your accounting process

Parse e-mail to automate  your accounting process
Photo by krakenimages / Unsplash

If you operate your own business, one thing that you'll end up doing is booking expense transactions into an accounting system. For a lot of people accounting happens in many different ways – sometimes you monitor bank withdraw transactions, you get credit card notifications, purchase mails from your amazon account and more. Keeping track of all these sources and reconciling is an important step to keeping your books up-to-date.

How much time do you spend on tracking your financial transactions, if at all? The success of your company depends on understanding your revenue, expense and cash flow – yet many of us don't spend enough time doing it. Sometimes it appears boring, the tasks are repetitive and menial, but this is exactly the kind of work that is best automated and let me show you how. Let me show you how you can do that with Python.

IMAP vs POP

First, use the right library to interact with your inbox. If you know something about e-mail, there exists IMAP; which is the Internet Message Access Protocol. This is a standard protocol which your email clients use to access your mailbox. There is POP (Post Office Protocol) which is an older protocol that was previously used to download all of your email locally. Since we care not to download and index all of our emails, we'll prefer to use IMAP instead.

Picking a Python Library to Use

Now that we have selection of which protocol to use out of the way, we might start down our journey in search of a python library to communicate with our email provider. Possible the first library that we might come across with Python to use is the native imaplib library. While introductory articles exist on connecting to your mailbox and searching for e-mail eixsts, I would not recommend this as a way to get access to your mail – mainly for the reason that this library is a very low level way of communicating over the imap protocol. There is much work that is needed to parse e-mail headers, encoding and get the e-mail body from your mail. You'll end up writing many helper functions to get the job done.

If you're thinking that there should be a library to help out with this, there certainly is. Enter imap-tools, which is much more functional compared to interacting with imaplib directly.

from imap_tools import AND, MailBox
import datetime

host = "<hostname to your mail provider"
user = "<your_username>"
pw = "<login_credentials>"

from_ = "some-email@amazon.com"
with MailBox(host).login(user, pw) as mb:
mb.folder.set("Finance")
criteria = AND(from_=from_, date_gte=datetime.date(2022, 1, 1))
mails = mb.fetch(criteria=criteria, reverse=True, bulk=True)
for mail in mails:
	print("parsing", mail.from_, mail.subject, mail.date, len(mail.text), mail.uid)

For the code above, you'll be filtering for all e-mail starting from some start date targeting some-email@amazon.com with a receipt date starting from 2022-01-01. The additional flags in the fetch function is self evident. For this example, accounting related e-mails are stored in a Finance folder. Rules to automatically bin your mails can generally be set at your mail provider level (ie. by Gmail, Yahoo mail, etc) which can help with targeting specific mails. You could also set the target folder to inbox if no filtering rules are set to move your emails to another location.

Parsing your e-mail

Once the fetch is performed it is then simple to operate on each email via the mail object. Each email is uniquely defined by a uid in your mail box and can be used as a key to fetch a specific email or as a marker to indicate if a specific mail has been processed.

In the example above, the text componet of the email is accessible via mail.text in the above example. This property provides the text content of the mail which is returned as a simple string. This text can be processed by standard text processing techniques which can include parsing by lines, the whole text, or searching for patterns via regex.

One additional comment is that e-mails may also be delivered in a html format which can be also automatically parsed and returned by the mail.html property. You'll get the html string of this e-mail which then can be loaded into any python html parsing library to assist with transversing the html tree to pick off the values that you are looking for.

With both potential methods of parsing the content of your shown here, the rest is just exporting the pertient data into some output format, storing it somewhere or sending it into your booking system.