Stop the bots with Google reCAPTCHA

Stop the bots with Google reCAPTCHA
(Image credit: Future)

Keeping bots out always is a numbers game – sadly, the availability of easy-to-use machine learning libraries made cracking many classic captcha types simple. Google stands at the forefront of the botting storm – after all, something only exists if it can be found (prominently) in Big G's indices.

Due to this, Google engineers devote significant amounts of effort into designing anti-bot systems and solutions. They are made available to third parties via a product called reCAPTCHA, which we will go over together during the following steps. For more tools unrelated to bots, see our web design tools post.

Before getting started, however, a few basic things must be cleared up. First of all: keeping bots out is always a server-side game. Inspecting the return values of your anti-bot measure on the client is idiotic. An attacker can analyse the source code or simply work around it by patching the return validation.

Secondarily, keep in mind that not all bots are created equal. Locking out the GoogleBot, for example, leads to your web site not being indexed anymore. Similar problems can occur with other industry-specific bots which often to more good than harm. Finally, bots might be a lesser evil in some cases – when traffic is all you need, a bot click turns out to be just a click.

01. Sign-up a go-go!

Stop the bots with Google reCAPTCHA: Sign-up a go-go!

Use your Google account to sign into ReCAPTCHA

(Image credit: Tam Hanna)

Google keeps a close eye on reCAPTCHA users. Head here and use your Google account to sign in. Add "localhost" in addition to your favourite domain under Domains. Pick the "I'm not a robot" checkbox type as it is the most well-known anti-bot measure.

02. Store site and server keys

Google rewards persistence in working through the setup process by displaying a server and a site key. While the latter can be shared with third parties, ensure that the secret key never leaves the confines of your server environment.

03. Understand site verification

Google uses a variation of the challenge-response process to ensure result integrity. Captcha instances return a cryptographic value which the server is to "turn in" to a verification system hosted by Big G – if the results turn out to be valid, a correct HTTP response is returned to your back-end logic.

04. Set up Express.JS

It's important to be demonstrating a turn-key solution showing the entire authentication flow. Due to this, we need to start out with a "server" of sorts – Express JS was a frequent topic recently, so deploy it to a newly-created project skeleton.

tamhan@tamhan-thinkpad:~/nodespace/nodeverify$ 
npm init --y
Wrote to /home/tamhan/nodespace/nodeverify/package.json:
  . . .
tamhan@tamhan-thinkpad:~/nodespace/nodeverify$ 
npm install body-parser express request --save
. . .
+ body-parser@1.18.3
+ request@2.88.0
+ express@4.16.4

05. Prepare to serve

Loading the well-known "I am not a robot" checkbox requires an HTML file. Given that this is a sample demonstrating the interaction flow, start out with a static document containing the mark-up shown accompanying this step.

<html>
	<head>
		<title>reCAPTCHA demo: Simple page
		</title>
		<script src="https://www.google.com/reCAPTCHA/api.js" async defer></script>
	</head>
	<body>
		<form action="?" method="POST">
			<div class="g-reCAPTCHA" data-sitekey="your_site_key"></div>
			<br/>
			<input type="submit" value="Submit">
		</form>
	</body>
</html>

06. Understand and test

Stop the bots with Google reCAPTCHA: Understand and test

Run the code to see these results

(Image credit: Tam Hanna)

Google provides an API file containing the captcha logic. When api.js is loaded, the code contained in it analyses the DOM and replaces any <div> tags containing the correct class tag. <submit> tags are left alone as of this writing. Either way, run the code in a browser of choice to see the results shown above.

07. Load some elements

Creating a delivery chain in Express.JS requires a bit of intelligence. One really useful helper is BodyParser. When embedded into a rendering workflow, the individual fields returned from the server can be accessed using object-oriented design patterns which greatly simplify handling them.

var Express = require("express");
var BodyParser = require("body-parser");
var Request = require("request");

var myApp = Express();

myApp.use(BodyParser.json());
myApp.use(BodyParser.urlencoded({ extended: true }));

08. Prepare to verify...

Google's checkbox does its magic in the background — when done, a new field called  g-reCAPTCHA-response is added to the form attributes. This data must be sent to Google's servers for verification. The long string in the code accompanying this step is will be replaced with the one Google assigned to you.

myApp.post("/tamstest", function(request, response) {
	var recaptcha_url = "https://www.google.com/recaptcha/api/siteverify?";
	recaptcha_url += "secret=6LewMZgUAAAAAIRSB2gv5akKx2cWyFUlKzRmd7ws&";
	recaptcha_url += "response=" + request.body["g-recaptcha-response"] + "&";
	recaptcha_url += "remoteip=" + request.connection.remoteAddress;

09. ...and enquire at Google's servers

The next act involves firing off the URL to the server. This is accomplished via a traditional JSON request based on address created in the previous step. Its results are then analysed – if an error is thrown, a failure will be returned to the client-side application.

	Request(recaptcha_url, function(error, 
	resp, body) {
		body = JSON.parse(body);
		if(body.success !== undefined && !body.
		success) {
			return response.send({ "message": 
			"Captcha validation failed" });
		}
		response.header("Content-Type", 
		"application/json").send(body);
	});
});

10. Set it loose

Express.JS's HTTP server needs to be unleashed. This is best accomplished by using the app.listen structure. Keep in mind that ports with a number that is smaller than 1024 are restricted to the root user on Unixoid operating systems – 3000 is therefore a safe bet.

var server = myApp.listen(3000, function() {
	console.log("Listening on " + server.
	address().port);
});

11. Adjust the form

Finally, our form needs to be put in touch with the local server running inside of Node.JS. This is best accomplished by adjusting the action string – be sure to point the target to a valid domain, especially if you don't use a local Express.JS server.

<body>
		<form action="http://localhost:3000/tamstest" method="POST">
			<div class="g-recaptcha" data-sitekey="6LewMZgUAAAAAEPSwPVP6HLulnjxa_scWtwhyHge"></div>
			<br/>

12. Bring it up!

Stop the bots with Google reCAPTCHA: Bring it up!

Use Python's quick HTTP server to test the client-server system

(Image credit: Tam Hanna)

Testing our client-server system requires two servers due to Google's origin verification. Fortunately, Python provides a quick HTTP server – use it to provide index.htm, while Express.JS gets enlisted to performing the verification of the returned response.

tamhan@tamhan-thinkpad:~/nodespace/nodeverify$
node index.js
Listening on 3000
tamhan@tamhan-thinkpad:~/nodespace/nodeverify$ 
python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 ...

13. Perform a dry run

Stop the bots with Google reCAPTCHA: Perform a dry run

Give your Captcha a test run

(Image credit: Tam Hanna)

When running, check the checkbox and fulfil any challenges Big G might throw at you. The server usually does not demand much in terms of verification, and proceeds to displaying the contents shown above.

14. Disable the submit button

Stop the bots with Google reCAPTCHA: Disable the submit button

Disabling the submit button at startup is good for usability

(Image credit: Tam Hanna)

Google's examples ignore the client-side submit button as an attacker can always make it reappear with some JavaScript. While disabling the submit knob in relation to the Captcha does not improve security, it does have positive effects on usability. Let us start out by disabling the button after start-up.

<form action="http://localhost:3000/tamstest" method="POST">
	<div class="g-reCAPTCHA" data-sitekey="6LewMZgUAAAAAEPSwPVP6HLulnjxa_scWtwhyHge"></div>
	<br/>
	<input type="submit" value="Submit" disabled>
</form>

15. Handle CAPTCHA events

Stop the bots with Google reCAPTCHA: Handle CAPTCHA events

Now add an event handler

(Image credit: Tam Hanna)

Next up, an event handler must be added. Google's CAPTCHA API will invoke it whenever the user passed a verification attempt from its client-side point of view.

<html>
	<head> . . .
		<script>
			function onDCFired(value){
				console.log(value);
			}
		</script>
	</head>
	<body>
		<form action="http://localhost:3000/
		tamstest" method="POST">
			<div class="g-recaptcha" data-sitekey="6LewMZgUAAAAAEPSwPVP6HLulnjxa_scWtwhyHge" data-callback=
			"onDCFired"></div>

16. Excursion: dynamic CAPTCHA rendering

Google does not limit developers to spawning reCAPTCHA elements during page load. The render() method found in the reCAPTCHA API lets you designate one or more <div> tags on the web site which are to be transformed into reCAPTCHA widgets.

var myCallback = function(val) { console.
log(val); };
grecaptcha.render(
	document.getElementsById(‘my-id'), 
	{
		callback: myCallback, 
		sitekey: mySiteKey
	});

17. Enable button if needed...

With that, one main issue remains. The submit button must be re-enabled when the click event provided by reCAPTCHA arrives in our application code. Loading jQuery into such a simple example is unnecessary, so fall back to some plain JavaScript instead.

	<script>
			function onDCFired(value){
				document.getElementById
				("Button").disabled = false;
			}
	</script>
		<input type="submit" id="Button" 
		value="Submit" disabled>

18. ...and clean up after us

Google cannot store challenge response data forever. Due to that, verification times out quite fast – by default, our program does not find out about that. Fortunately, the API contains an additional field which can be used to notify.

		function onDCExpired(value){
			document.getElementById("Button").
			disabled = true;
		}
	<div class="g-recaptcha" data-sitekey="6LewMZgUAAAAAEPSwPVP6HLulnjxa_scWtwhyHge" 
	data-callback="onDCFired" data-expired-callback="onDCExpired"></div>

19. Use additional attributes

Stop the bots with Google reCAPTCHA: Handle CAPTCHA events

There are plenty of useful properties to play with

(Image credit: Tam Hanna)

Head here to find a list of properties included in the JavaScript API. They let you adjust various behaviours – for example, the reCAPTCHA widget can also be rendered in a night-friendly colour scheme!

20. Work transparently…

Google recently introduced a third version of the reCAPTCHA API which does not require explicit user interaction. Instead, you simply load it during page initialisation – the code automatically monitors the behaviour of the user running wild on the web site.

<script src="https://www.google.com/recaptcha/api.js?render=recaptcha_site_key"></script>
	<script>
	grecaptcha.ready(function() {
		grecaptcha.execute(‘recaptcha_site_key',
		{action: ‘homepage'}).
		then(function(token) {
			...
		});
	});
	</script>

21. ...and provide additional information

The plug-in profits from further data about the action that is being performed on the website. The snippet accompanying this step will announce that the user currently visits your homepage. More information on the API can be found in the documentation.

<script>
	grecaptcha.ready(function() {
		grecaptcha.execute(‘reCAPTCHA_site_key', 
		{action: ‘homepage'});
	});
</script>

Generate CSS – the conference for web designers: 26 September, Rich Mix, Shoreditch, London

(Image credit: Future)

Join us at Rich Mix, Shoreditch on 26 September for Generate CSS, a bespoke conference for web designers brought to you by Creative Bloq, net and Web Designer. Book your Early Bird ticket before 15 August at www.generateconf.com.

This article was originally published in issue 287 of creative web design magazine Web Designer. Buy issue 287 here or subscribe to Web Designer here.

Related articles: