A couple of weeks ago I set about implementing Pingback 1.0 for this website. The first logical step was trying to see if there was an existing open–source project for Java implementations of Pingback or whether there were any easily reusable examples to borrow on the intarwebz.
My searches came up dry, so I rolled my own. And, so I have something to write about too! Here‘s how I did it.
The first thing I looked at was identifying links to other pages within the articles of my site. There are loads of regular expressions claiming some degree of effectiveness at identifying URLs, I chose a regex based on this one.
public static final String URL_REGEX = "((\":)|href=\")((http(s?)\\:\\/\\/|~/|/)?((\\w+:\\w+@)?(([-\\w]+\\.)+(com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))|localhost)(:[\\d]{1,5})?(((/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|/)+|\\?|#)?((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)(&([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?)"
Regular expressions are incomprehensible when they get that big! And, this is probably overkill and too specific (e.g. it will ignore any new top–level–domains that are invented).
Anyway, my regex matches any URLs that are either the value for a href attribute in HTML (e.g. href="some_url"...) or part of Textile markup ("Some label":some_url). I check for both because Textile allows for HTML to be embedded so just parsing links in Textile won‘t do. Also this allows me to use the same regex to check remote pages for links to my own site for Pingback server compliance as I use to check my own Textile articles for outgoing links.
The codebase for the API has turned out to be quite tiny. I wouldn‘t say that it is innovative but it certainly re-usable and should simplify the task for anyone who is trying to do the same as I have done. The source is available here. It's in the form of a maven project, so if you want to build it, just type mvn install .
Pingback Client
It is for implementing Pingback clients that this API provides most value. This is the case because I found it very difficult to decouple the Pingback server functionality from the specific implementation of my website. That‘s the subject for another post, however.
Here‘s how the code in my website uses the API.
package com.malethan.blog.app;
import com.malethan.blog.RequestUtil;
import com.malethan.blog.models.BlogPost;
import com.malethan.pingback.Link;
import com.malethan.pingback.PingbackClient;
import com.malethan.pingback.LinkLoader;
import com.malethan.pingback.PingbackException;
import com.malethan.seemorej.AfterFilter;
import static com.malethan.seemorej.SeemoreJ.*;
import static com.malethan.seemorej.Flash.*;
import com.malethan.seemorej.hibernate.crud.CrudControllerHibernate;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.core.io.ClassPathResource;
import org.springframework.beans.factory.xml.XmlBeanFactory;
import java.util.List;
import java.util.ArrayList;
public class BlogPostController extends CrudControllerHibernate<BlogPost, Long> {
private static final Log log = LogFactory.getLog(BlogPostController.class);
XmlBeanFactory beanFactory;
PingbackClient pingbackClient;
LinkLoader linkLoader;
List<String> failedPings;
public BlogPostController() {
super(BlogPost.class, Long.class);
beanFactory = new XmlBeanFactory(new ClassPathResource("/applicationContext.xml", getClass()));
}
@AfterFilter(include = {"create", "update"})
public void createOutgoingLinks() throws Exception {
BlogPost blogPost = (BlogPost) request().getAttribute(modelNameLower);
if (blogPost.isPublished()) {
initialiseClientAndLinkLoader();
failedPings = new ArrayList<String>();
for (String linkAddress : linkLoader.findLinkAddresses(blogPost.getBody())) {
if (!blogPost.hasOutgoingLinkToUrl(linkAddress)) {
loadRemotePageAndSendPingbacks(blogPost, linkAddress);
}
}
dao.saveOrUpdate(blogPost);
if (failedPings.size() > 0) {
notifyUserOfBadPingbacks(failedPings);
}
}
}
private void initialiseClientAndLinkLoader() {
pingbackClient = (PingbackClient) beanFactory.getBean("pingBackClient");
linkLoader = (LinkLoader) beanFactory.getBean("pingbackLinkLoader");
}
private void loadRemotePageAndSendPingbacks(BlogPost blogPost, String linkAddress) {
Link link = linkLoader.loadLink(linkAddress);
if(link.isSuccess()) {
if(link.isPingbackEnabled()) {
sendPingback(blogPost, link);
} else {
blogPost.addOutGoingLink(link.getTitle(), link.getUrl());
}
}
}
private void sendPingback(BlogPost blogPost, Link link) {
try {
String permaLink = RequestUtil.getAppURL(request()) + "/article/" + blogPost.getSlug() + ".html";
pingbackClient.sendPingback(permaLink, link);
blogPost.addOutGoingLink(link.getTitle(), link.getUrl());
} catch (PingbackException e) {
log.error("Pingback to '" + link.getUrl() + "' failed", e);
failedPings.add("Pingback to " + link.getUrl() + " failed because of " + e.getMessage() + " ");
if (PingbackClient.PINGBACK_ALREADY_REGISTERED == e.getFaultCode()) {
blogPost.addOutGoingLink(link.getTitle(), link.getUrl());
}
}
}
private void notifyUserOfBadPingbacks(List<String> failedPings) {
String errMsg = "";
for (String failedPing : failedPings) {
errMsg += failedPing;
}
flash(NOTICE, errMsg);
}
}
The annotation @AfterFilter(include = {"create", "update"}) causes the method createOutgoingLinks() to be invoked after an action is invoked if the action is called create or update. Those two actions are part of the SeemoreJ CRUD framework (that's another post as well, if I ever get around to it). Hopefully, it's easy to make out what's going on. Essentially, if a post is published it, looks for all links with fully qualified URLs in the article and – if the remote resources support it – attempts to send them a pingback. Any failures are displayed using a Rails–style flash system.
I‘ve noticed that this chugs a little if there are a few links in a page and/or certain resources are slow loading. Still, it‘s not publicly visible so I‘ll deal with it for the moment.
Also, this example uses Spring to load the default implementations of LinkLoader and PingbackClient both interfaces defined in the library. It would work just as well with concrete instantiations. Though, it would be more difficult to test :)
It was certainly fun to write, I hope somebody finds it useful :)