Ruby - UNIX MBOX ヘッダ集計!
Updated:
また、 Ruby での UNIX MBOX メールヘッダについてです。 そろそろ、ヘッダ部分ではなくて本文部分について検証しようかと考えていましたが、ヘッダ部分についてもう一つ調べておきたいことを思いつきました。
今までは、主要なフィールドのみの検証を行ってきましたが、「どんなフィールドがどれだけ存在しているのか?」ということを調べておかないといけないな、と感じたところです。
単純に、UNIX MBOX メールデータを全件読み込み、ヘッダ部分をカウントしただけです。
参考までに、当方の61,665件のメールデータでの集計結果を掲載します。 ※「だから何?」と言われても、「近い将来に目論んでいる事のための自分用の記録です。」としか言いようがありません。あしからず。。。
使用した正規表現
Rubyでメールデータの各行を判別するために使用した正規表現です。 (コロンの左側に半角英数字・”-“・”_“を1文字以上含むもを判定)
/^([\w\-_]+?):/
UNIX MBOX ヘッダフィールド集計結果
| フィールド名 | 件数 |
|---|---|
| Authentication-results | 10,787 |
| Auto-submitted | 7,517 |
| Bcc | 14 |
| Bounces-to | 600 |
| Cc | 889 |
| Comment | 170 |
| Comments | 709 |
| Content-class | 306 |
| Content-disposition | 562 |
| Content-language | 25 |
| Content-return | 13 |
| Content-transfer-encoding | 37,981 |
| Content-transfer-encording | 77 |
| Content-type | 59,241 |
| Date | 61,662 |
| Delivered-to | 41,373 |
| Dkim-signature | 1,872 |
| Domainkey-signature | 2,052 |
| Envelope-from | 15 |
| Envelope-id | 130 |
| Error-to | 349 |
| Errors-to | 8,445 |
| From | 61,661 |
| From-key | 1 |
| Importance | 334 |
| In-reply-to | 2,988 |
| Keywords | 6,771 |
| Lines | 1,274 |
| List-archive | 130 |
| List-help | 1,425 |
| List-id | 1,406 |
| List-owner | 1,295 |
| List-post | 1,425 |
| List-software | 1,295 |
| List-subscribe | 132 |
| List-unsubscribe | 6,437 |
| Mail-id | 29 |
| Mailer | 3 |
| Mailing-list | 4,287 |
| Message-id | 61,616 |
| Mime-version | 50,360 |
| Openpgp | 2 |
| Organization | 38 |
| Posted | 1,071 |
| Precedence | 7,643 |
| Priority | 355 |
| Received | 226,506 |
| Received-spf | 3,162 |
| Recieved | 2 |
| References | 1,870 |
| Reply-to | 26,407 |
| Resent-bcc | 1 |
| Resent-date | 19 |
| Resent-from | 19 |
| Resent-message-id | 19 |
| Resent-sender | 5 |
| Resent-to | 19 |
| Return-path | 57,040 |
| Returnpath | 1 |
| Sender | 10,195 |
| Spml-clientid | 12 |
| Status | 23,160 |
| Subject | 61,616 |
| Template | 1 |
| Thread-index | 322 |
| Thread-topic | 295 |
| To | 61,637 |
| User-agent | 5,500 |
| User-info | 1 |
| Useragent | 17 |
| X-2ndmail-uid | 1 |
| X-accept-language | 155 |
| X-account-key | 31,007 |
| X-adtoone-url | 96 |
| X-amavis-alert | 260 |
| X-amavis-modified | 22 |
| X-amazon-category | 31 |
| X-amazon-client-host | 76 |
| X-amazon-client-sendtime | 76 |
| X-amazon-corporate-relay | 3 |
| X-amazon-mail-relay-type | 224 |
| X-amazon-rte-version | 418 |
| X-amazon-track | 98 |
| X-anti-virus | 28 |
| X-antiabuse | 605 |
| X-antispam | 13 |
| X-antivirus | 11,064 |
| X-antivirus-scanner | 1 |
| X-antivirus-status | 8,334 |
| X-apparently-from | 76 |
| X-apparently-to | 8,367 |
| X-auditid | 7 |
| X-authentication | 4 |
| X-authentication-warning | 427 |
| X-authuser | 1 |
| X-avg | 115 |
| X-avg-id | 1,703 |
| X-base64-encode | 5 |
| X-beatsmtptime | 2 |
| X-beenthere | 132 |
| X-bfi | 4 |
| X-biglobe-sender | 146 |
| X-biglobe-viruscheck | 102 |
| X-bkaspil-bayesresult | 1,103 |
| X-bkaspil-blip | 4 |
| X-bkaspil-comment | 8 |
| X-bkaspil-greyip | 1 |
| X-bkaspil-learn | 9 |
| X-bkaspil-result | 20,011 |
| X-bkrandomsig-folder | 1 |
| X-bloptin-id | 2 |
| X-bp-to | 4 |
| X-brightmail-tracker | 6 |
| X-campaignid | 22 |
| X-cid | 6 |
| X-cite-me | 2 |
| X-client-ip | 5 |
| X-cr-hashedpuzzle | 1 |
| X-cr-puzzleid | 1 |
| X-cron-env | 40,164 |
| X-customer | 2 |
| X-custommail | 1 |
| X-cwatml | 1 |
| X-delivered-to | 1 |
| X-deliveryid | 388 |
| X-demail-no | 1,304 |
| X-destination-id | 21 |
| X-dispatcher | 2 |
| X-dkim | 103 |
| X-domainkeys | 103 |
| X-dreammail-newlottery | 2 |
| X-durian-ck | 28 |
| X-durian-control | 28 |
| X-durian-db | 28 |
| X-durian-dl | 28 |
| X-durian-id | 28 |
| X-egroups-announce | 3 |
| X-egroups-application | 499 |
| X-egroups-from | 5 |
| X-egroups-return | 4,297 |
| X-eid | 2 |
| X-ema-cid | 650 |
| X-ema-lid | 650 |
| X-ema-pc | 650 |
| X-email | 388 |
| X-ems | 12 |
| X-emsysactivitycode | 1 |
| X-emsysto | 1 |
| X-encrypted | 1 |
| X-enigmail-version | 41 |
| X-env-sender | 4 |
| X-envelope-from | 2 |
| X-face | 2 |
| X-filterd-recvd-size | 1 |
| X-fmftcr | 1 |
| X-google-ads-sender | 3 |
| X-google-adsense-creation-method | 1 |
| X-google-adsense-message | 1 |
| X-google-sender-auth | 105 |
| X-goomoji-source | 4 |
| X-greylist | 26 |
| X-haid | 1 |
| X-hcid | 1 |
| X-hid | 45 |
| X-hmid | 1 |
| X-http-user-agent | 4 |
| X-http_referer | 4 |
| X-id | 1 |
| X-imss-scan-details | 3 |
| X-ip | 7 |
| X-iregisurl | 1 |
| X-ironport-av | 4 |
| X-job | 380 |
| X-k-date | 2 |
| X-k-time | 1 |
| X-library | 1 |
| X-list-administrivia | 2 |
| X-listmember | 2 |
| X-loop | 4 |
| X-lt-bounce-hook | 1 |
| X-lyris-message-id | 94 |
| X-macky-id | 1 |
| X-mag2catcode | 3,886 |
| X-mag2id | 6,445 |
| X-mag2media | 3,886 |
| X-mag2newcatcode | 3,799 |
| X-magazineid | 7 |
| X-magclick-relay | 1,077 |
| X-magclick-status | 1,077 |
| X-magclick-store | 1,077 |
| X-magsta-no | 157 |
| X-magtype | 15 |
| X-mail-agent | 307 |
| X-mail-count | 1,277 |
| X-mail_id | 2 |
| X-mail_send_id | 2 |
| X-mailcheck | 1 |
| X-mailer | 11,231 |
| X-mailer-plugin | 186 |
| X-mailingid | 21 |
| X-mailman-approved-at | 130 |
| X-mailman-version | 132 |
| X-mailno | 2 |
| X-mailocx | 2 |
| X-map-mixer-originators | 130 |
| X-matter-id | 58 |
| X-message-code | 247 |
| X-message-id | 589 |
| X-message-info | 1 |
| X-messageid | 2 |
| X-mimeole | 609 |
| X-mimetrack | 63 |
| X-ml-address | 3 |
| X-ml-info | 1,297 |
| X-ml-name | 1,277 |
| X-ml-recipient | 10 |
| X-mlserver | 1,297 |
| X-mozilla-keys | 32,582 |
| X-mozilla-status | 48,335 |
| X-mozilla-status2 | 48,334 |
| X-ms-has-attach | 1 |
| X-ms-tnef-correlator | 1 |
| X-msfbl | 4 |
| X-msg-ref | 4 |
| X-msmail-priority | 353 |
| X-mta | 1 |
| X-mxl-hash | 4 |
| X-nat-received | 16 |
| X-nifty-srcip | 16,911 |
| X-no-archive | 1 |
| X-okwave | 1 |
| X-org-to | 2 |
| X-original-mailer | 10 |
| X-original-to | 35,975 |
| X-originalarrivaltime | 496 |
| X-originalip | 4 |
| X-originating-email | 5 |
| X-originating-ip | 6871 |
| X-outgoing | 15 |
| X-panda | 1 |
| X-pgp-fingerprint | 7 |
| X-pgp-key | 2 |
| X-pgp-public-key | 5 |
| X-php-originating-script | 70 |
| X-pid | 2 |
| X-pmx-version | 1 |
| X-priority | 4,115 |
| X-processed | 4 |
| X-proofpoint-spam-details | 17 |
| X-proofpoint-virus-version | 1 |
| X-proreg-templatenumber | 1 |
| X-quickml | 3 |
| X-r-mail_shop | 5,721 |
| X-r-news | 339 |
| X-r-url | 5,721 |
| X-rams-id | 9 |
| X-rams-source | 7 |
| X-rcpt-to | 58 |
| X-recipientsmember | 1 |
| X-remote-addr | 4 |
| X-remote-host | 6 |
| X-report-abuse | 21 |
| X-resent-from | 30 |
| X-sbrs | 1 |
| X-se-envelope-from | 1 |
| X-se-envelope-to | 1 |
| X-se-fromip | 1 |
| X-se-hid | 262 |
| X-se-tc | 1 |
| X-sender | 3,814 |
| X-senderid | 1 |
| X-sequence | 402 |
| X-session-marker | 1 |
| X-sf-loop | 1 |
| X-sgxh1 | 6 |
| X-silkfrom | 5 |
| X-silkfromname | 5 |
| X-silkid | 5 |
| X-sm-id | 403 |
| X-sm-sendconf | 56 |
| X-smfbl | 21 |
| X-smheadermap | 21 |
| X-smtp-result | 4 |
| X-source | 33 |
| X-source-args | 33 |
| X-source-dir | 33 |
| X-spam-checker-version | 9,602 |
| X-spam-flag | 2,578 |
| X-spam-level | 15,139 |
| X-spam-prev-subject | 52 |
| X-spam-report | 52 |
| X-spam-score | 5,540 |
| X-spam-status | 15,141 |
| X-starscan-version | 4 |
| X-subscription | 4 |
| X-sympl-dummy-id | 19 |
| X-sympl-id | 19 |
| X-sympl-knrno | 19 |
| X-sympl-member-id | 19 |
| X-sympl-retry | 19 |
| X-template | 10 |
| X-terrace-spammark | 2 |
| X-tid | 1 |
| X-tis-spam | 55 |
| X-tm-as-product-ver | 8 |
| X-tm-as-result | 3 |
| X-tm-as-user-approved-sender | 5 |
| X-tr2-no | 5 |
| X-tr3-no | 52 |
| X-trak-extra-language | 3 |
| X-twittercreatedat | 18 |
| X-twitteremailtype | 18 |
| X-twitterimpressionid | 73 |
| X-twitterrecipientid | 18 |
| X-twitterrecipientname | 18 |
| X-twitterrecipientscreenname | 18 |
| X-twittersenderid | 18 |
| X-twittersendername | 18 |
| X-twittersenderscreenname | 18 |
| X-uid | 52 |
| X-uidl | 54,167 |
| X-url | 5,132 |
| X-utopiareport | 1 |
| X-valueof-contact_id | 6 |
| X-vector-swreg-template | 9 |
| X-vector-swreg-version | 9 |
| X-verification | 2 |
| X-virtualserver | 21 |
| X-virtualservergroup | 21 |
| X-virus-checked | 1 |
| X-virus-scanned | 28,799 |
| X-virus-status | 197 |
| X-virusscan | 3 |
| X-vpass-template | 1 |
| X-vss-header | 416 |
| X-x | 1 |
| X-yahoo-bounces | 76 |
| X-yahoo-dmid | 412 |
| X-yahoo-id | 412 |
| X-yahoo-mmid | 411 |
| X-yahoo-mmo | 412 |
| X-yahoo-newman-expires | 261 |
| X-yahoo-newman-id | 876 |
| X-yahoo-newman-property | 683 |
| X-yahoo-nmail | 76 |
| X-yahoo-profile | 3,784 |
| X-yahoo-return-path | 5 |
| X-yahoofilteredbulk | 6 |
| X-yjpvirusscan | 1 |
| X-ymail-osg | 34 |
| X_id | 4 |
全部で351フィールド存在しました。その内280件が “X-“ で始まるメールサーバ独自のフィールドでした。 また、それ以外でも見慣れいないものが多数ありました。
自分のメールのメールヘッダには、どんなものがどれくらい存在するのかが、大体把握できました。
以上。
Comments