Cache Stamp for Spider Spotting

Spider Spotting

Spider spotting tactics include using a time stamp for knowing precisely when Google, Wayback, or any robot vistited your page. It works as long as there's a public cache copy you can accesse. I use the local system time, easily available through the DATE_LOCAL environment variable. This makes it relatively easy to do basic forensics including spotting copyright infringement, or on key pages like AMP (AMPHTML), or anything else where you want to know when the page was requested.

Ideas

Spider spotting
IP address collection
User-agent, referrer, other env vars

There are several ways to implement a time stamp from environment variables. I'm using server side includes (SSI) here but you can use PHP just as easily. You'll want to echo the DATE_LOCAL environment variable and format the time configuration to your liking. Includes are interpolated (embedded) into pages using HTML comment syntax.

          <!--#function attribute=value-->

You'll want to configure the time format to look how you want.

          config timefmt="%a %b %d %y, %I:%M %p %Z"

The first part (before the comma) formats the way day, month and year will display. The second part formats the way time appears. At run time, SSI directives which take place after your time configuration statement, such as to echo the DATE_LOCAL variable, will return your custom formatted local server time. Be aware that your machine's local time is going to be wherever you host. The cache stamps here are going to be Eastern Standard Time. Google uses GMT, so you may need to account for the difference depending what information you want..

          <!--#config timefmt="%a %b %d %y, %I:%M %p %Z"-->
          <!--#echo var="DATE_LOCAL"-->

While the cache stamps in our footer make it easy to see our version, you certainly don't have to publish anything visible to your users. Simply write includes nested inside an HTML comment source and it won't display visibly on the page. Try it on a page or two, and then take a look at the remote source of a cache copy. You can use this for more than time. We also stamp IP addresses into cache machines. It's about forensics.

Apache Time Format Documentation