View Full Version : How are robots getting past FormMail's Captcha?

12-17-2009, 02:33 PM
We've got a couple of forms that get machine-submitted (I'm assuming) invalid requests from time to time. Not a big deal, but enough to have motivated me to install the Captcha-enabled version of FormMail that Westhost offers.

That didn't solve the problem, but I can't see how it is that the robot is getting past the Captcha. *I* can't get by it -- I have to give exactly the right 4 characters and ONLY those 4, or I get the by-design rejection. Yet I get submissions with with, for example,
verifytext: Fx7S8

Since the robots like to fill everything in, I thought of something new to try, and added a "leave this field blank" field, and perl code to reject the submission if that instruction is not followed.

Today I got a form submission that had more than 4 chars in the verifytext AND something in the robotbait. Somehow they worked around BOTH filters, and I can't see how. It occurred to me that maybe they were just scarfing the destination email address and creating an email with all the fields one would expect from a valid submission, and sending it themselves... but if so, they're doing a really fine job of spoofing all the headers and the message, comparing a valid and invalid message. Seems unlikely.

The other oddity is that the bogus submission doesn't seem to have any useful payload. There are some workable URLs, but pretty much all the text and hostnames look like garbage. (Haven't followed any to see what the sites might look like! Virus deployment means, perhaps? That would explain the useless appearance of the hostnames, and the motivation for these form submissions.)

Before diving more seriously into the perl code, I went to see if I had the latest version. I don't. MSA has a v1.93 dated this year, but not including the Captcha business. There's also the nms version that MSA promotes (http://www.scriptarchive.com/nms.html) (vintage 2004), and the TFMail variant on the NMS SourceForge page (http://nms-cgi.sourceforge.net/scripts.shtml) (vintage 2006) that was suggested in another thread about FormMail. I'm assuming I'll start with one of those latter two, patch on the Captcha if need be, and so on.

Are others seeing this sort of thing? Any suggestions or ideas on a more reliable form-to-mail CGI script?

12-18-2009, 07:25 AM
I've used the TFMail variation in the past with the captcha from WestHost tacked on. It worked pretty good. I just don't use Perl enough to lead you in an explanation on how they spoof the authorization on the verify.

12-19-2009, 11:56 AM
Thanks, Shawn. I've installed TFmail and got it working with a basic form and customization to suit. I'm at the point of "tacking on the captcha" and it's not blindingly obvious where best to start tacking.

Care to share any or all of your work for that merger?

12-20-2009, 07:49 AM
I knew you where going to ask me that. :) It has been quite some time since I integrated this so I had to go back and review my files. It is possible that the scripts have changed since then some but this should get you going.

First off you need to have the verifytext field and the call to the captcha.cgi script as the source of an image just like in formmail in your form.

<input type='text' name='verifytext'> <img src='/cgi-bin/captcha.cgi'>
The next step is modifying the TFmail.pl file. BTW, I don't know if this really makes a difference but I'v actually renamed that file to something unique for my site. True a hacker can still figure out what that file is now but as a rule they are lazy and if they are trying to access the script direct would be trying the common name for the file. I also renamed the TFmail_config.pl.

Within sub main of TFmail.pl around line 140 right after the call to check_session I added this.

if (!CheckCaptcha($treq))
print "Content-type: text/html\n\n";
print "<html><body><center><h2>Incorrect Verification Code</h2><br /><div style='width: 80%'><p>The verification code didn't match what was expected. Please <a href='javascript:history.go(-1)'>go back</a> and double-check it.</p></div></center></body></html>";

It is not very pretty if they fail the check but you could change that if you wanted to. The main thing to note is that you are checking to see if CheckCaptcha returns true and if it does not then you "do" something. Looking at the script now I could have perhaps used a call to html_page to bring up a template but of course then I would have to create the template. :)

The last thing you have to do is "copy" the CheckCaptcha and CleanUpOldFiles from check-captcha.cgi to the bottom of the TFmail.pl file. I did this because they the code needs to be changed some from the original. I am going to post them here and let you compare to what you have now.

sub CheckCaptcha
my $tempdir = "/tmp/captcha";
my $nofile = 0;
my $cookieip = '';

my ( $treq ) = @_;

# open the temp datafile for current user based on ip
my $tempfile = "$tempdir/$ENV{'REMOTE_ADDR'}";
open (TMPFILE, "<$tempfile")|| ($nofile = 1);
(my @checkimage) = <TMPFILE>;
close TMPFILE;

# if no matching ip file check for a cookie match
# this will compensate for AOL proxy servers accessing images
if ($nofile == 1)
$cookieip = $ENV{HTTP_COOKIE};
$cookieip =~ /checkme=([^;]*)/;
$cookieip = $1;
if ($cookieip ne "")
my $tempfile = "$tempdir/$cookieip";
open (TMPFILE, "<$tempdir/$cookieip")|| &nofile;
(my @checkimage) = <TMPFILE>;
close TMPFILE;

my $imagetext = $checkimage[0];
chomp $imagetext;
# set the form input to lower case
my $a = lc($treq->param('verifytext'));

# compare the form input with the file text
if ($a ne "$imagetext")
# Don't clean up yet - the user will likely be returning soon
return 0;

# now delete the temp file so it cannot be used again by the same user
$tempfile =~ /(.*)/;
$tempfile = $1;
unlink "$tempfile";

# if no error continue with the program
return 1;
sub CleanUpOldFiles
my $tempdir = "/tmp/captcha";

# remove all old temp files
# this keeps the director clean without setting up a cron job
opendir TMPDIR, "$tempdir";
my @alltmpfiles = readdir TMPDIR;

foreach my $oldtemp (@alltmpfiles)
my $age = 0;
$age = (stat("$tempdir/$oldtemp"))[9];
# if age is more than 300 seconds or 5 minutes
if ((time - $age) > 300){
$oldtemp =~ /(.*)/;
$oldtemp = $1;
unlink "$tempdir/$oldtemp";

I think that is it. Let me know if you have questions or if yo think I left something out.

12-20-2009, 10:15 AM
Bees knees, Shawn. Your recipe is spot on.

Now to sit back and wait for the robots to try again, and see if they are getting by the captcha the way they have been with formmail.

I like the code and the enhancements in TFmail, especially the moving of the recipient addresses into a config file accessed by the code, where they're not prone to harvesting. With the inline templating for the email (et al.), all of the particulars for a form can be corralled in the one .trc, and kept out of the web tree.

12-20-2009, 12:42 PM
I agree Tom that TFmail is pretty sweet. Glade you where able to get the captcha integrated with it. I was a bit worried I had forgotten something. :) With this set up I don't ever get any spam via my contact page... knock on wood.... so I hope it works as well for you.

12-20-2009, 10:45 PM
Well, I did find one problem. In your CheckCaptcha subroutine, you have the line

open (TMPFILE, "<$tempdir/$cookieip")|| &nofile;
but that "nofile" subroutine doesn't exist here. (You probably wrote it though?)

I ran into this with back-and-forth testing, when my form/config had a capitalization mismatch between a field name and what was listed as a required field... fixed that, and wouldn't run into the branch under "normal" circumstances.... but the OR does point off into space.

12-21-2009, 06:33 AM
Good catch. Looks like the orginal sub routine in check-captcha.cgi was:

sub nofile {

print "Content-type: text/html\n\n";
print "No file found for verification.";

Adding that to the TFmail.pl file should fix the error. Is that what you did or did you work something else out?

12-21-2009, 10:41 AM
I'd fixed my (unrelated) problem that was causing me to hit that error branch, and waited to hear from you. :-) Cleaning up the old scripts, I see this (and the other two) are out of check-captcha.cgi,
which helps me understand what's going on.

Thanks again for the help.

04-09-2012, 11:31 PM
Years later, back at this, and found a behavior I didn't like: the error trap for a missing required field suggests you use your browser's BACK function, which will show the partially-filled form and original Captcha... that can't be satisfied, since the first time it was matched, the file is removed. Flow as originally implemented was: (1) check Captcha (and remove file on success); (2) check for required fields.

Makes more sense to me to check that required fields are provided, and only if they are to check the Captcha. Accomplished that by moving the "if (!CheckCaptcha($treq)) { }" block into the "if ( check_required_fields($treq) ) { }" branch, rather than before it. The CheckCaptcha() function provided for retaining the image file if the match failed... so now either error and BACK will keep the form (and Captcha) functional.

04-10-2012, 05:14 AM
Wow I have not thought of this in for ever! I use other programs now for contact pages so kind of just left this in the dust. :) Nice! What you say makes sense to me.