Posted:
Find attachments when parsing emails in a robust way
While working on [BUG] Email Attachments are not added to issue #3496 I found this neat little approach to identify attachments in an e-mail1 :
func isAttachment(part *multipart.Part) bool {
return part.FileName() != "" || part.Header.Get("Content-Disposition") == strings.ToLower("attachment")
}
Why is this neat?
E-Mail parsing is notoriously complicated. You are dealing with many weird ways to do one thing: add an attachment to an e-mail. This is done by using MIME Multipart
2 which looks like this in the raw content of an e-mail:
From: Sender <mail@example.com>
Content-Type: multipart/mixed;
boundary="Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\))
Subject: Re: test issue (#1)
X-Universally-Unique-Identifier: 21614FBE-0379-47D2-8427-7D22D9D88642
Date: Sat, 27 Apr 2024 20:27:58 +0200
To: recipient@example.com
--Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
Fifth try with attachment and footer
> On 27. Apr 2024, at 19:17, jwildeboer <noreply@forgejo.org> wrote:
>=20
>=20
> Fourth try with Attachment.
>=20
> ---
> View it on FOR TESTING ONLY, ALL DATA CAN BE WIPED OUT AT ANY TIME or =
reply to this email directly.
---=20
I am the footer of this e-mail
--Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C
Content-Disposition: inline;
filename="Screenshot 2024-04-20 at 15.31.37.png"
Content-Type: image/png;
name="Screenshot 2024-04-20 at 15.31.37.png"
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAAoAAAACYCAYAAAB011j8AABnR2lDQ1BJQ0MgUHJvZmlsZQAAeJyk
3HdYE9nDL/Cxt1VJofcqIpAEEBuQQhGlJHSkJRQREUhAwAIkATslE7ADIQF1rZBgb5Bgb5Cga92F
BPu6QgJ23XXuOfu77/Pc997nPvePax4/DJMzcyYz55zvOYoiSKl7Oo+XNxFBkLWZBUXRSxl2iSuS
7KbokHHg9e+v9My1PDqLFQ63/+vrf//1+dF/yj7wgOfaNMHj2JuZ80XbfP5Jf4q13f8/y/+3X1Oz
Vq7NBF9fgt8pmbyiYgQZRwPbrHXFPLgtBttEuhfDC2wfRJDi8Myc9CwEKTGA/e7p6bwNCFJqBbZn
[...]
--Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C--
This is a mulipart message (the Content-Type: multipart/mixed;
in the headers tells us) and the parts are seperated with --Apple-Mail=_2A60E813-185B-42CA-8B4B-1C4145D7134C
, just to make sure you see the pattern. (Did you notice the extra two --
in the final line? That’s the MIME way of saying “no more parts coming, we’re done here” ;)
Now, the problem is that there are several ways to add file attachments to such a multipart message. The typical one uses Content-Disposition: attachment; filename="file.png"
but that is not really standardised.
As you can see, the Apple Mail client uses Content-Disposition: inline; filename="file.png"
which is also a perfectly valid way to do it.
A little bit of history: the original thought (back in the 1990s) was that
Content-Disposition: inline; filename="file.png"
should tell the mail client to display the attachment inline, as part of the message, whileContent-Disposition: attachment; filename="file.png"
should be displayed as a list of attached files at the end of the mail. Many moons later MIME was also used for web pages. In web browsers theattachment
option should lead to a download dialogue, whileinline
would be displayed as a page.
The problem
So how do you find and extract file attachments when you receive an e-mail, for example when you are a forgejo instance? By looking for Content-Disposition
? This is what Forgejo currently (as of Version 7.0.1) does. Only when Content-Disposition: attachment;
is used, it will extract the file. (a pull request to fix this is at WIP: Add inline attachments to comments #3504) But Apple Mail uses inline
. So forgejo doesn’t “see” the attachments.
A solution
And that is why this code snippet is a possible solution. It does a smart thing. To see if a multipart
part is an attachment, it checks if there is a filename
present OR if Content-Disposition: attachment;
is set. This elegantly catches also the inline
parts, as long as they have a filename. A very robust and reliable solution, IMHO!
(Mulitpart parts could also have a cid
instead of a filename, which means it’s an embedded file, which is only used inside the (HTML) mail, but that’s a different can of worms)
COMMENTS
You can use your Mastodon or other ActivityPub account to comment on this article by replying to the associated post.