Roberto Cortez Java Blog

REST API Evolution

posted by Décio Sousa on March 3, 2015

In one way or another, every developer has come in touch with an API. Either integrating a major system for a big corporation, producing some fancy charts with the latest graph library, or simply by interacting with his favorite programming language. The truth is that APIs are everywhere! They actually represent a fundamental building block of the nowadays Internet, playing a fundamental role in the data exchange process that takes place between different systems and devices. From the simple weather widget on your mobile phone to a credit card payment you perform on an online shop, all of these wouldn’t be possible if those systems wouldn’t communicate with each other by calling one another’s APIs.

So with the ever growing eco-system of heterogeneous devices connected to the internet, APIs are put a new set of demanding challenges. While they must continue to perform in a reliable and secure manner, they must also be compatible with all these devices that can range from a wristwatch to the most advanced server in a data-center.

REST to the rescue

One of the most widely used technologies for building such APIs are the so called REST APIs. These APIs aim to provide a generic and standardize way of communication between heterogeneous systems. Because they heavily rely on standard communication protocols and data representation – like HTTP, XML or JSON – it’s quite easy to provide client side implementations on most programming languages, thus making them compatible with the vast majority of systems and devices.

So while these REST APIs can be compatible with most devices and technologies out there, they also must evolve. And the problem with evolution is that you sometimes have to maintain retro-compatibility with old client versions.

Let’s build up an example.

Let’s imagine an appointment system where you have an API to create and retrieve appointments. To simplify things let’s imagine our appointment object with a date and a guest name. Something like this:

public class AppointmentDTO {
    public Long id;
    public Date date;
    public String guestName;
}

public class AppointmentDTO {

public Long id;

public Date date;

public String guestName;

}

A very simple REST API would look like this:

@Path("/api/appointments")
public class AppointmentsAPI {

    @GET
    @Path("/{id}")
    public AppointmentDTO getAppointment(@PathParam("id") String id) { ... }

    @POST
    public void createAppointment(AppointmentDTO appointment) { ... }

}

@Path("/api/appointments")

public class AppointmentsAPI {

@GET

@Path("/{id}")

public AppointmentDTO getAppointment(@PathParam("id") String id) { ... }

@POST

public void createAppointment(AppointmentDTO appointment) { ... }

}

Let’s assume this plain simple API works and is being used on mobile phones, tablets and various websites that allow for booking and displaying appointments. So far so good.

At some point, you decide it would be very interesting to start gathering some statistics about your appointment system. To keep things simple you just want to know who’s the person who booked most times. For this you would need to correlate guest between themselves and decide you need to add an unique identifier to each guest. Let’s use Email. So now your object model would look like something like this:

public class AppointmentDTO {
    public Long id;
    public Date date;
    public GuestDTO guest;
}

public class GuestDTO {
    public String email;
    public String name;
}

public class AppointmentDTO {

public Long id;

public Date date;

public GuestDTO guest;

}

public class GuestDTO {

public String email;

public String name;

}

So our object model changed slightly which means we will have to adapt the business logic on our api.

The Problem

(R)evolution!

While adapting the API to store and retrieve the new object types should be a no brainer, the problem is that all your current clients are using the old model and will continue to do so until they update. One can argue that you shouldn’t have to worry about this, and that customers should update to the newer version, but the truth is that you can’t really force an update from night to day. There will always be a time window where you have to keep both models running, which means your api must be retro-compatible.

This is where your problems start.

So back to our example, in this case it means that our API will have to handle both object models and be able to store and retrieve those models depending on the client. So let’s add back the guestName to our object to maintain compatibility with the old clients:

public class AppointmentDTO {
    public Long id;
    public Date date;

    @Deprecated //For retro compatibility purposes
    public String guestName;

    public GuestDTO guest;
}

public class AppointmentDTO {

public Long id;

public Date date;

@Deprecated //For retro compatibility purposes

public String guestName;

public GuestDTO guest;

}

Remember a good thumb rule on API objects is that you should never delete fields. Adding new ones usually won’t break any client implementations (assuming they follow a good thumb rule of ignoring new fields), but removing fields is usually a road to nightmares.

Now for maintaining the API compatible, there are a few different options. Let’s look at some of the alternatives:

Duplication: pure and simple. Create a new method for the new clients and have the old ones using the same one.
Query parameters: introduce a flag to control the behavior. Something like useGuests=true.
API Versioning: Introduce a version in your URL path to control which method version to call.

So all these alternatives have their pros and cons. While duplication can be plain simple, it can easily turn your API classes into a bowl of duplicated code.

Query parameters can (and should) be used for behavior control (for example to add pagination to a listing) but we should avoid using them for actual API evolutions, since these are usually of a permanent kind and therefore you don’t want to make it optional for the consumer.

Versioning seems like a good idea. It allows for a clean way to evolve the API, it keeps old clients separated from new ones and provides a generic base from all kinds of changes that will occur during your API lifespan. On the other hand it also introduces a bit of complexity, specially if you will have different calls at different versions. Your clients would end up having to manage your API evolution themselves by upgrading a call, instead of the API. It’s like instead of upgrading a library to the next version, you would upgrade only a certain class of that library. This can easily turn into a version nightmare…

To overcome this we must ensure that our versions cover the whole API. This means that I should be able to call every available method on /v1 using /v2. Of course that if a newer version on a given method exists on v2 it should be run on the /v2 call. However, if a given method hasn’t changed in v2, I expect that the v1 version would seamlessly be called.

Inheritance based API Versioning

In order to achieve this we can take advantage of Java objects polymorphic capabilities. We can build up API versions in a hierarchical way so that older version methods can be overridden by newer, and calls to a newer version of an unchanged method can be seamlessly fallen back to it’s earlier version.

So back to our example we could build up a new version of the create method so that the API would look like this:

@Path("/api/v1/appointments")    //We add a version to our base path
public class AppointmentsAPIv1 { //We add the version to our API classes

    @GET
    @Path("/{id}")
    public AppointmentDTO getAppointment(@PathParam("id") String id) { ... }

    @POST
    public void createAppointment(AppointmentDTO appointment) { 
        //Your old way of creating Appointments only with names
    }
}

//New API class that extends the previous version
@Path("/api/v2/appointments")                      
public class AppointmentsAPIv2 extends AppointmentsAPIv1 {

    @POST
    @Override
    public void createAppointment(AppointmentDTO appointment) { 
        //Your new way of creating appointments with guests
    }
}

@Path("/api/v1/appointments") //We add a version to our base path

public class AppointmentsAPIv1 { //We add the version to our API classes

@GET

@Path("/{id}")

public AppointmentDTO getAppointment(@PathParam("id") String id) { ... }

@POST

public void createAppointment(AppointmentDTO appointment) {

//Your old way of creating Appointments only with names

}

//New API class that extends the previous version

@Path("/api/v2/appointments")

public class AppointmentsAPIv2 extends AppointmentsAPIv1 {

@POST

@Override

public void createAppointment(AppointmentDTO appointment) {

//Your new way of creating appointments with guests

}

So now we have 2 working versions of our API. While all the old clients that didn’t yet upgrade to the new version will continue to use v1 – and will see no changes – all your new consumers can now use the latest v2. Note that all these calls are valid:

Call	Result
GET /api/v1/appointments/123	Will run getAppointment on the v1 class
GET /api/v2/appointments/123	Will run getAppointment on the v1 class
POST /api/v1/appointments	Will run createAppointment on the v1 class
POST /api/v2/appointments	Will run createAppointment on the v2 class

This way any consumers that want to start using the latest version will only have to update their base URLs to the corresponding version, and all of the API will seamlessly shift to the most recent implementations, while keeping the old unchanged ones.

Caveat

For the keen eye there is an immediate caveat with this approach. If your API consists of tenths of different classes, a newer version would imply duplicating them all to an upper version even for those where you don’t actually have any changes. It’s a bit of boiler plate code that can be mostly auto-generated. Still annoying though.

Although there is no quick way to overcome this, the use of interfaces could help. Instead of creating a new implementation class you could simply create a new Path annotated interface and have it implemented in your current implementing class. Although you would sill have to create one interface per API class, it is a bit cleaner. It helps a little bit, but it’s still a caveat.

Final thoughts

API versioning seems to be a current hot topic. Lot of different angles and opinions exists but there seems to be a lack of standard best practices. While this post doesn’t aim to provide such I hope that it helps to achieve a better API structure and contribute to it’s maintainability.

A final word goes to Roberto Cortez for encouraging and allowing this post on his blog. This is actually my first blog post so load the cannons and fire at will. 😉

Introducing Décio Sousa

posted by Roberto Cortez on February 25, 2015

0 Comments

tags: blog

Hi everyone! Until now, this blog has been written entirely by me, but today we have a new author: Décio Sousa. We have been working together for several years and I consider him one of my closest friends. For some time, I have been trying to involve other people in writing blog posts. Décio accepted the challenge and here is the result: REST API Evolution. An interesting post on how to evolve REST API’s, based on his current project.

I would like to welcome Décio and to thank him for taking the time to write the post. I’m sure we are going to see more posts from him in the future.

I moved to Git SVN

posted by Roberto Cortez on February 24, 2015

0 Comments

tags: git, subversion

For the last couple of years, I’ve been working with a company that uses Subversion. Yes, you are probably pointing now that we should have migrated to Git by now. Let’s not discuss that. Until now, Subversion didn’t get much in my way while performing my development tasks, but that slowly started to change. Probably a lot of people went through this as well.

My SVN Problems

My main problems with Subversion at the moment are:

Working with multiple Subversion branches at the same time. I usually checkout each branch that I need in a separate folder. Yes, I could use svn switch, but it’s harder to isolate new feature development from bug fixing with only one source.
Our Subversion repository is quite big now. You feel the pain (slowness) when comparing files in the history or annotating a file.
Multiple commits relating to the same feature. I like to commit the several stages of a feature development. This helps to establish baselines on what’s working and what’s not in a timeline. But it’s a pain to look into the multiple commits of that feature when you are searching the history.

Solution

I’m also a Git user, and I know that Git solves these problems for me. Branches are cheap to create and you can stash your changes for easy switching between branches. Since you have a local copy of the repository, everything is faster. Using squash will combine all your feature development commits into a single commit for a cleaner history.

Instead of nagging the organization to move to Git (I’m doing that anyway), you can jump right away to Git, by using git svn. It ships with the standard Git installation and allows you to have a bidirectional connection between Git and Subversion.

This isn’t something new. It’s been around for a few years, but this is my first time using it. I have to say that it was not easy to setup thing the way I wanted. So I’m writing this post to remind myself of the steps I have followed in case I need them in the future. Of course, I also hope that these could help other people experiencing the same problems.

Setup

You’re mostly going to use the command:

git svn

Assuming the following information:

SVN Repository URL	http://svn-repo/myProject
SVN Trunk URL	http://svn-repo/myProject/trunk
SVN Branch URL	http://svn-repo/myProject/branches

Init the Repository

Let’s start by creating our local repository. Type the following command:

git svn init --trunk=trunk http://svn-repo/myProject myProject

This will initialize a local Git repository. This command will not checkout anything yet. My Subversion repository follows a standard directory layout and I could use the -s argument instead of manually indicating the trunk. I choose to manually indicate the trunk with --trunk argument because I only want to check out the trunk. If you use the -s everything sitting on branches and tags will be checked out. In my case I want to control exactly which branches , or tags I’m checking out.

If we want to include branches we can type:

git svn init --trunk=trunk --branches=branches http://svn-repo/myProject myProject

Better yet, if we only want the branch B1 and the brach B2 we can do it like this:

git svn init --trunk=trunk --branches={B1,B2} http://svn-repo/myProject myProject

Now look into the .git folder and file config. It should look like this:

[core]
	repositoryformatversion = 0
	filemode = false
	bare = false
	logallrefupdates = true
	symlinks = false
	ignorecase = true
	hideDotFiles = dotGitOnly
[svn-remote "svn"]
	url = http://svn-repo/
	fetch = myProject/trunk:refs/remotes/trunk
	branches = myProject/branches/{B1,B2}/*:refs/remotes/*

[core]

repositoryformatversion = 0

filemode = false

bare = false

logallrefupdates = true

symlinks = false

ignorecase = true

hideDotFiles = dotGitOnly

[svn-remote "svn"]

url = http://svn-repo/

fetch = myProject/trunk:refs/remotes/trunk

branches = myProject/branches/{B1,B2}/*:refs/remotes/*

You can always change the settings by manually editing this file.

When fetching specific branches there is a trick here. Notice that the generated file has a * after the branches definition of {B1,B2}. This means, that git svn will fetch all subdirectories of the branches folder B1 and B2 and track each individual subdirectory as a remote branch. It’s a bit weird. Maybe I’m doing the original command wrong, but I had to manually remove the * to track the branches properly. Make sure that you have this in the file:

	branches = myProject/branches/{B1,B2}/:refs/remotes/*

1	branches = myProject/branches/{B1,B2}/:refs/remotes/*

This Stackoverflow question: How do I tell git-svn about a remote branch created after I fetched the repo? explains in a more detailed manner how to deal with branches.

Fetch the Code

To actually fetch the code from the Subversion repository type:

git svn fetch --no-follow-parent

In same cases, Git can create additional branches with the format {B1}@-{0-9}. You usually don’t need this and you can prevent their creating by adding the --no-follow-parent parameter. For a full explanation please check this Stackoverflow question: git-svn clone | spurious branches.

By the way this operation can be VERY VERY SLOW, depending on the size of your repository, it can take several hours to complete. If you are short on time, execute the commands before you go to sleep and you should have them ready when you wake up. To speed it up, you can use the parameter --revision and specify a Subversion revision number. The fetch will only be performed from that point forward. The downside is that you don’t have the historical data before the specified revision.

Update the Code

When you need to update your Git repository with the Subversion one, execute:

git svn rebase

Push the Code

You don’t use push to send your local changes to the Subversion repository, instead use:

git svn dcommit

Final Thoughts

You can now enjoy all the benefits of using Git even if you are stuck with a Subversion repository. There are also a few limitations, but you can work around them. You cannot directly map multiple Subversion repositories to a single Git repository. This may be relevant depending on your Subversion structure. Also, committing your changes to Subversion may be a bit slow.

Anyway, I’m happy with the change. I feel that it increased my productivity, but that’s something you have to figure out by yourself. Just give it a try and see if it works for you. If not, you can always use your old Subversion repository.

Jfokus 2015 and Voxxed Days Vienna

posted by Roberto Cortez on February 13, 2015

2 Comments

tags: conference, jfokus, speaker, voxxed days

I kicked off my conference year by attending during the last week Jfokus 2015 and the first edition ever of Voxxed Days held in Vienna. I was scheduled as a speaker for both conferences with my sessions about Java EE 7 Batch Processing in the Real World and The 5 People in your Organization that grow Legacy Code.

Jfokus

This was Jfokus 9th edition, so we can expect a great celebration next year for the 10th anniversary edition. We are still one year away, but I already scheduled it in my calendar! The conference was 3 days long, with the first day dedicated to Tutorials and the next couple of days to conference sessions. The numbers are impressive. It had over 1700 attendees, making it one of the largest Java conferences in Europe.

Sessions

Docker had been a hot topic over the last year. So, naturally we had a few sessions dedicated to Docker. I do recommend checking out the Docker Workshop from Ken Sipe. You can find it here. It also includes how to scale Docker using Apache Mesos. Apache Mesos is a distributed system kernel that abstracts CPU, memory, storage, and other resources away from machines so you can program against your datacenter like it’s a single pool of resources. Unfortunately, I think this tutorial session was not recorded. Only a few rooms had their sessions recorded.

These are my top 3 sessions (from the ones I have attended):

My Session

I had my session about Java EE Batch Processing in the Real World as a Lightning Talk. It was a bit hard to do it in only 15 minutes, since the session was originally planed for a full conference session of 50 minutes. Anyway, I was able to demo everything I wanted.

I also had the privilege to be a guest to the Live Nighthacking stream with Stephen Chin, where I talk about Java EE Batch in much more detail. Check it out:

And here are some slides. I didn’t use them all the Lighting session, since they are from my full session.

Voxxed Days Vienna

A new brand of smaller conferences was launched as Voxxed Days. These are one day tech events organised by local community groups and supported by the Voxxed and Devoxx team. I was happy enough to be part of the first edition ever and to be a speaker of course! We got the usual cinema like venue, which is to be expected from the Devoxx brand. There were probably around 200 attendees, or maybe a little more.

Sessions

Since this was a one day only conference, there were not many sessions, but we had 4 full tracks worth of content to choose from. I recommend to check out Monadic Java by Mario Fusco with a very good way to explain Monads in Java to newbies. Also, Coding Culture by Sven Peters is a must. You will hear real life stories about how Atlassian evolved as a company and how they create awesome stuff. These sessions were recorded and should be available on Parleys very soon.

My Session

I had the session about The 5 People in your Organization that grow Legacy Code. I have presented the same session for the first time in Java One and the recording was released a few days before I delivered the session in Vienna. This was great, since it gave the change to check some of the mistakes I made and correct a few things. This one was also recorded. Let’s see if I improved a little bit. Thank you to Sven Peters, which provided me with a few pointers to improve my presentation stance.

Anyway, here is the full video (from JavaOne):

Final Words

It was an awesome week, but also very consuming. I was very tired at the end, but it was great to hook up with old friends. Thanks to Mattias Karlsson and Grzegorz Duda for having me in Jfokus 2015 and Voxxed Days Vienna.

I als have to mention Paulo Grácio. We worked together for 6 years and he was a mentor to me in the early stages of my career. Paulo is now working in Stockholm and we are far from each other, but I’m looking forward to work together again. Thanks for the hospitality!

Development Horror Story – Release Nightmare

posted by Roberto Cortez on January 22, 2015

4 Comments

tags: horror story

Everyone has good stories about releases that went wrong, right? I’m no exception and I have a few good ones under my development career. These are usually very stressful at the time, but now me and my teammates can’t talk about these stories without laughing.

History

I think this happened around 2009. Me and my team had to maintain a medium to large legacy web application with around 500 k lines of code. This application was developed by another company, so we didn’t have the code. Since we were in charge now and needed the code to maintain it, they handed us the code in a zip file (first pointer that something was wrong)!

Their release process was peculiar to say the least. I’m pretty sure there are worst release procedures out there. This one consisted in copying the changed files (*.class, *.jsp, *.html, etc) to an exploded war folder on a Tomcat server. We also had three environments (QA, PRE, PROD) with different application versions and no idea which files were deployed on each. They also had a ticket management application with attached compiled files, ready to be deployed and no idea of the original sources. What could possibly go wrong here?

The Problem

Our team was able to make changes required by the customer and push them to PROD servers. We have done it a few times successfully, even with all the handicaps. Everything was looking good until we got another request for additional changes. These changes were only a few improvements in the log messages of a batch process. The batch purpose was to copy files sent to the application with financial data input to insert into a database. I guess that I don’t have to state the obvious: this data was critical to calculate financial movements with direct impact on the amounts paid by the application users.

After our team made the changes and perform the release, all hell went loose. Files were not being copied to the correct locations. Several data duplicated in the database and the file system. Financial transactions with incorrect amounts. You name it. A complete nightmare. But why? The only change was a few improvements in the log messages.

The Cause

The problem was not exactly related with the changed code. Look at the following files:

public class BatchConfiguration {
    public static final String OS = "Windows";
}

public class BatchConfiguration {

public static final String OS = "Windows";

}

And:

public class BatchProcess {
    public void copyFile() {
        if (BatchConfiguration.OS.equals("Windows")) {
            System.out.println("Windows");
        } else if (BatchConfiguration.OS.equals("Unix")) {
            System.out.println("Unix");
        }
    }

    public static void main(String[] args) {
        new BatchProcess().copyFile();
    }
}

public class BatchProcess {

public void copyFile() {

if (BatchConfiguration.OS.equals("Windows")) {

System.out.println("Windows");

} else if (BatchConfiguration.OS.equals("Unix")) {

System.out.println("Unix");

}

public static void main(String[] args) {

new BatchProcess().copyFile();

}

This is not the real code, but for the problem purposes it was laid out like this. Don’t ask me about the why it was like this. We got it in the zip file, remember?

So we have here a variable which sets the expected Operating System and then the logic to copy the file is dependant on this. The server was running on a Unix box so the variable value was Unix. Unfortunately, all the developers were working on Windows boxes. I said unfortunately, because if the developer that implemented the changes was using Unix, everything would be fine.

Anyway, the developer changed the variable to Windows so he could proceed with some tests. Everything was fine, so he performs the release. He copied the resulting BatchProcess.class into the server. He didn’t bother about the BatchConfiguration, since the one on the server was configured to Unix right?

Maybe you already spotted the problem. If you haven’t, try the following:

Copy and build the code.
Execute it. Check the output, you should get Windows.
Copy the resulting BatchProcess.class to an empty directory.
Execute this one again. Use command line java BatchProcess

What happened? You got the output Windows, right?. Wait! We didn’t have the BatchConfiguration.class file in the executing directory. How is that possible? Shouldn’t we need this file there? Shouldn’t we get an error?

When you build the code, the java compiler will inline the BatchConfiguration.OS variable. This means that the compiler will replace the variable expression in the if statement with the actual variable value. It’s like having if ("Windows".equals("Windows"))

Try executing javap -c BatchProcess. This will show you a bytecode representation of the class file:

public void copyFile();
    Code:
       0: ldc           #3                  // String Windows
       2: ldc           #3                  // String Windows
       4: invokevirtual #4                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
       7: ifeq          21
      10: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
      13: ldc           #3                  // String Windows
      15: invokevirtual #6                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      18: goto          39
      21: ldc           #3                  // String Windows
      23: ldc           #7                  // String Unix
      25: invokevirtual #4                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      28: ifeq          39
      31: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
      34: ldc           #7                  // String Unix
      36: invokevirtual #6                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      39: return

public void copyFile();

Code:

0: ldc #3 // String Windows

2: ldc #3 // String Windows

4: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z

7: ifeq 21

10: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;

13: ldc #3 // String Windows

15: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

18: goto 39

21: ldc #3 // String Windows

23: ldc #7 // String Unix

25: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z

28: ifeq 39

31: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;

34: ldc #7 // String Unix

36: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

39: return

You can confirm that all the variables are replaced with their constant values.

Now, returning to our problem. The .class file that was copied to the PROD servers had the Windows value set in. This messed everything in the execution runtime that handled the input files with the financial data. This was the cause of the problems I’ve described earlier.

Aftermath

Fixing the original problem was easy. Fixing the problems caused by the release was painful. It involved many people, many hours, pizza, loads of SQL queries, shell scripts and so on. Even our CEO came to help us. We called this the mUtils problem, since it was the original java class name with the code.

Yes, we migrated the code to something manageable. It’s now on a VCS with a tag for every release and version.