Posted:

Find attachments when parsing emails in a robust way

3 minute read

While working on [BUG] Email Attachments are not added to issue #3496 I found this neat little approach to identify attachments in an e-mail1 :

func isAttachment(part *multipart.Part) bool {
	return part.FileName() != "" || part.Header.Get("Content-Disposition") == strings.ToLower("attachment")
}

Why is this neat?

E-Mail parsing is notoriously complicated. You are dealing with many weird ways to do one thing: add an attachment to an e-mail. This is done by using MIME Multipart2 which looks like this in the raw content of an e-mail:

From: Sender <mail@example.com>
Content-Type: multipart/mixed;
	boundary="Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\))
Subject: Re: test issue (#1)
X-Universally-Unique-Identifier: 21614FBE-0379-47D2-8427-7D22D9D88642
Date: Sat, 27 Apr 2024 20:27:58 +0200
To: recipient@example.com

--Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Fifth try with attachment and footer

> On 27. Apr 2024, at 19:17, jwildeboer <noreply@forgejo.org> wrote:
>=20
>=20
> Fourth try with Attachment.
>=20
> ---
> View it on FOR TESTING ONLY, ALL DATA CAN BE WIPED OUT AT ANY TIME or =
reply to this email directly.

---=20
I am the footer of this e-mail


--Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C
Content-Disposition: inline;
	filename="Screenshot 2024-04-20 at 15.31.37.png"
Content-Type: image/png;
	name="Screenshot 2024-04-20 at 15.31.37.png"
Content-Transfer-Encoding: base64

iVBORw0KGgoAAAANSUhEUgAAAoAAAACYCAYAAAB011j8AABnR2lDQ1BJQ0MgUHJvZmlsZQAAeJyk
3HdYE9nDL/Cxt1VJofcqIpAEEBuQQhGlJHSkJRQREUhAwAIkATslE7ADIQF1rZBgb5Bgb5Cga92F
BPu6QgJ23XXuOfu77/Pc997nPvePax4/DJMzcyYz55zvOYoiSKl7Oo+XNxFBkLWZBUXRSxl2iSuS
7KbokHHg9e+v9My1PDqLFQ63/+vrf//1+dF/yj7wgOfaNMHj2JuZ80XbfP5Jf4q13f8/y/+3X1Oz
Vq7NBF9fgt8pmbyiYgQZRwPbrHXFPLgtBttEuhfDC2wfRJDi8Myc9CwEKTGA/e7p6bwNCFJqBbZn

[...]

--Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C--

This is a mulipart message (the Content-Type: multipart/mixed; in the headers tells us) and the parts are seperated with --Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C, just to make sure you see the pattern. (Did you notice the extra two -- in the final line? That’s the MIME way of saying “no more parts coming, we’re done here” ;)

Now, the problem is that there are several ways to add file attachments to such a multipart message. The typical one uses Content-Disposition: attachment; filename="file.png" but that is not really standardised.

As you can see, the Apple Mail client uses Content-Disposition: inline; filename="file.png" which is also a perfectly valid way to do it.

A little bit of history: the original thought (back in the 1990s) was that Content-Disposition: inline; filename="file.png" should tell the mail client to display the attachment inline, as part of the message, while Content-Disposition: attachment; filename="file.png" should be displayed as a list of attached files at the end of the mail. Many moons later MIME was also used for web pages. In web browsers the attachment option should lead to a download dialogue, while inline would be displayed as a page.

The problem

So how do you find and extract file attachments when you receive an e-mail, for example when you are a forgejo instance? By looking for Content-Disposition? This is what Forgejo currently (as of Version 7.0.1) does. Only when Content-Disposition: attachment; is used, it will extract the file. (a pull request to fix this is at WIP: Add inline attachments to comments #3504) But Apple Mail uses inline. So forgejo doesn’t “see” the attachments.

A solution

And that is why this code snippet is a possible solution. It does a smart thing. To see if a multipart part is an attachment, it checks if there is a filename present OR if Content-Disposition: attachment; is set. This elegantly catches also the inline parts, as long as they have a filename. A very robust and reliable solution, IMHO!

(Mulitpart parts could also have a cid instead of a filename, which means it’s an embedded file, which is only used inside the (HTML) mail, but that’s a different can of worms)

  1. https://github.com/OfimaticSRL/parsemail/blob/master/parsemail.go#L383 

  2. https://en.wikipedia.org/wiki/MIME 

COMMENTS

You can use your Mastodon or other ActivityPub account to comment on this article by replying to the associated post.