Yearly archives "2014"

Java EE 7 Batch Processing and World of Warcraft – Part 1

posted by Roberto Cortez on
tags: ,

This was one of my sessions at the last JavaOne. This post is going to expand the subject and look into a real application using the Batch JSR-352 API. This application integrates with the MMORPG World of Warcraft.

Since the JSR-352 is a new specification in the Java EE world, I think that many people don’t know how to use it properly. It may also be a challenge to identify the use cases to which this specification apply. Hopefully this example can help you understand better the use cases.

Abstract

World of Warcraft is a game played by more than 8 million players worldwide. The service is offered by region: United States (US), Europe (EU), China and Korea. Each region has a set of servers called Realm that you use to connect to be able to play the game. For this example, we are only looking into the US and EU regions.

World of Warcraft Horde Auction House

One of the most interesting features about the game is that allows you to buy and sell in-game goods called Items, using an Auction House. Each Realm has two Auction House’s. On average each Realm trades around 70.000 Items. Let’s crunch some numbers:

  • 512 Realm’s (US and EU)
  • 70 K Item’s per Realm
  • More than 35 M Item’s overall

The Data

Another cool thing about World of Warcraft is that the developers provide a REST API to access most of the in-game information, including the Auction House’s data. Check here the complete API.

The Auction House’s data is obtained in two steps. First we need to query the correspondent Auction House Realm REST endpoint to get a reference to a JSON file. Next we need to access this URL and download the file with all the Auction House Item’s information. Here is an example:

http://eu.battle.net/api/wow/auction/data/aggra-portugues

The Application

Our objective here is to build an application that downloads the Auction House’s, process it and extract metrics. These metrics are going to build a history of the Items price evolution through time. Who knows? Maybe with this information we can predict price fluctuation and buy or sell Items at the best times.

The Setup

For the setup, we’re going to use a few extra things to Java EE 7

Jobs

The main work it’s going to be performed by Batch JSR-352 Jobs. A Job is an entity that encapsulates an entire batch process. A Job will be wired together via a Job Specification Language. With JSR-352, a Job is simply a container for the steps. It combines multiple steps that belong logically together in a flow.

We’re going to split the business login into three jobs:

  • Prepare – Creates all the supporting data needed. List Realms, create folders to copy files.
  • Files – Query realms to check for new files to process.
  • Process – Downloads the file, process the data, extract metrics.

The Code

Back-end – Java EE 7 with Java 8

Most of the code is going to be in the back-end. We need Batch JSR-352, but we are also going to use a lot of other technologies from Java EE: like JPA, JAX-RS, CDI and JSON-P.

Since the Prepare Job is only to initialize application resources for the processing, I’m skipping it and dive into the most interesting parts.

Files Job

The Files Job is an implementation of AbstractBatchlet. A Batchlet is the simplest processing style available in the Batch specification. It’s a task oriented step where the task is invoked once, executes, and returns an exit status. This type is most useful for performing a variety of tasks that are not item-oriented, such as executing a command or doing file transfer. In this case, our Batchlet is going to iterate on every Realm make a REST request to each one and retrieve an URL with the file containing the data that we want to process. Here is the code:

A cool thing about this is the use of Java 8. With parallelStream() invoking multiple REST request at once is easy as pie! You can really notice the difference. If you want to try it out, just run the sample and replace parallelStream() with stream() and check it out. On my machine, using parallelStream() makes the task execute around 5 or 6 times faster.

Update
Usually, I would not use this approach. I’ve done it, because part of the logic involves invoking slow REST requests and parallelStreams really shine here. Doing this using batch partitions is possible, but hard to implement. We also need to pool the servers for new data every time, so it’s not terrible if we skip a file or two. Keep in mind that if you don’t want to miss a single record a Chunk processing style is more suitable. Thank you to Simon Martinelli for bringing this to my attention.

Since the Realms of US and EU require different REST endpoints to invoke, these are perfect to partitioned. Partitioning means that the task is going to run into multiple threads. One thread per partition. In this case we have two partitions.

To complete the job definition we need to provide a JoB XML file. This needs to be placed in the META-INF/batch-jobs directory. Here is the files-job.xml for this job:

In the files-job.xml we need to define our Batchlet in batchlet element. For the partitions just define the partition element and assign different properties to each plan. These properties can then be used to late bind the value into the LoadAuctionFilesBatchlet with the expressions #{partitionPlan['region']} and #{partitionPlan['target']}. This is a very simple expression binding mechanism and only works for simple properties and Strings.

Process Job

Now we want to process the Realm Auction Data file. Using the information from the previous job, we can now download the file and do something with the data. The JSON file has the following structure:

The file has a list of the Auction’s from the Realm it was downloaded from. In each record we can check the item for sale, prices, seller and time left until the end of the auction. Auction’s are algo aggregated by Auction House type: Alliance and Horde.

For the process-job we want to read the JSON file, transform the data and save it to a database. This can be achieved by Chunk Processing. A Chunk is an ETL (Extract – Transform – Load) style of processing which is suitable for handling large amounts of data. A Chunk reads the data one item at a time, and creates chunks that will be written out, within a transaction. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

ItemReader

The real files are so big that they cannot be loaded entirely into memory or you may end up running out of it. Instead we use JSON-P API to parse the data in a streaming way.

To open a JSON Parse stream we need Json.createParser and pass a reference of an inputstream. To read elements we just need to call the hasNext() and next() methods. This returns a JsonParser.Event that allows us to check the position of the parser in the stream. Elements are read and returned in the readItem() method from the Batch API ItemReader. When no more elements are available to read, return null to finish the processing. Note that we also implements the method open and close from ItemReader. These are used to initialize and clean up resources. They only execute once.

ItemProcessor

The ItemProcessor is optional. It’s used to transform the data that was read. In this case we need to add additional information to the Auction.

ItemWriter

Finally we just need to write the data down to a database:

The entire process with a file of 70 k record takes around 20 seconds on my machine. I did notice something very interesting. Before this code, I was using an injected EJB that called a method with the persist operation. This was taking 30 seconds in total, so injecting the EntityManager and performing the persist directly saved me a third of the processing time. I can only speculate that the delay is due to an increase of the stack call, with EJB interceptors in the middle. This was happening in Wildfly. I will investigate this further.

To define the chunk we need to add it to a process-job.xml file:

In the item-count property we define how many elements fit into each chunk of processing. This means that for every 100 the transaction is committed. This is useful to keep the transaction size low and to checkpoint the data. If we need to stop and then restart the operation we can do it without having to process every item again. We have to code that logic ourselves. This is not included in the sample, but I will do it in the future.

Running

To run a job we need to get a reference to a JobOperator. The JobOperator provides an interface to manage all aspects of job processing, including operational commands, such as start, restart, and stop, as well as job repository related commands, such as retrieval of job and step executions.

To run the previous files-job.xml Job we execute:

Note that we use the name of job xml file without the extension into the JobOperator.

Next Steps

We still need to aggregate the data to extract metrics and display it into a web page. This post is already long, so I will describe the following steps in a future post. Anyway, the code for that part is already in the Github repo. Check the Resources section.

Resources

You can clone a full working copy from my github repository and deploy it to Wildfly. You can find instructions there to deploy it.

World of Warcraft Auctions

Check also the Java EE samples project, with a lot of batch examples, fully documented.

Java One 2014 – Create the Future


JavaOne
I spent the last week in San Francisco to attend JavaOne 2014. This was my third time attending JavaOne, so I was already familiarized with the conference. Anyway, this year was different since I was going as a speaker for the first time.

Create the Future

“Create the Future” was the theme of JavaOne this year. The last few years have been very exciting for the Java community. After many years without evolution, we see now Java 8 with lambdas and streams, Java EE 7 with new specifications and simplifications)and a huge effort to unify and support Java for embeddable devices. Java 9 is already in the pipeline which promises modular Java (project Jigsaw). Java EE 8 is going to improve a lot of specifications and bring new ones like MVC, JSON-B and the much awaited JCache. Now it’s the time to contribute by Adopting a JSR.

During the last few years we heard a lot of voices claiming that Java is dead. Looking at what’s happening now, it doesn’t seem that way. The platform is evolving, a lot of new developers are joining the JVM ecosystem, and the conference was vibrating with energy. By the way, Java is turning 20 years in 2015. Let’s see what is going to happen in 20 years from now. Let’s hope that this blog is still around!

Keynote

The opening Keynote was a recap on what’s happening in the last few years. You can find all the videos here. Just a few notes:

  • Coimbra JUG shows on the map of the new JUG’s:

    JavaOne - Coimbra JUG

  • The technical Keynote was interrupted because of lack of time. This also happened to me in one of my sessions. I understand that there is a time frame, but this was not the best way to kick out the conference. I’m pretty sure that most attendees would prefer to shorten up the Strategy Keynote for the Technical one.
  • I was referenced in the Community Keynote, because of my work at the Java EE 7 Hackergarten. Thank you Heather VanCura. Count me in with future contributions!

Venue

The event was split between the Moscone Center, The Hilton Hotel and the Parc 55 Hotel. I’m not from the time where JavaOne was completely held in the Moscone Center, so I can’t compare. Because of the layout of the hotels, you need to run sometimes from session to session and the corridors are not the best place to have groups of people chatting. A few of the rooms also have columns in the middle which makes difficult for the attendees and the speaker to be aware of everything.

In my session Development Horror Stories [BOF4223] I had to run with Simon Maple, to get there on time. The problem was that the previous slot sessions were held at the Hilton and then moved to the Moscone, which is a 15 minutes walk. By the way, no taxi wanted to take us because it was too close.

Food

Not even going to comment about it. Yeah the lunch sucked, and yeah I’m weird with the food.

Sessions

There is so much stuff going on, that it’s impossible to attend every session that you want to go. I probably only attended half of the sessions that I’ve signed up for. I had to split some of my time between the sessions, the Demogrounds, the Hackergarten and also a bit of personal time for the last details of my sessions. Not all sessions had video recording, but all of them should have audio and be available via Parleys.

These are my top 3 sessions (from the ones I have attended):

My Sessions

I’m relatively happy with my performance delivering the sessions, but I can improve much more. I do have to say, that I didn’t feel any nervousness. I guess that I’m feeling more comfortable on public speaking, plus preparing everything with a few weeks in advance also helped. Moving forward!

Development Horror Stories [BOF4223]

with Simon Maple
We had around 150+ people signed up, but only 50 or so showed up. I think this was related to the switch venues problem I described earlier. At the same time there was also an Oracle Tech Party with food, drinks and music. I guess that didn’t help either.

Anyway, me and Simon kicked out the BOF with a few of our own stories where things went terribly wrong. The crowd was really into it, so our plan to ask people for the audience to share their own stories worked perfectly. We probably had around 10+ people stepping up the stage. In the end we had a Java 8 In Action book give away signed by the author, for the best story voted by the audience. The winning story belong to Jan when he wrote a few scripts to clear and insert data into a database for tests. Unfortunately he executed it in a production environment by accident!

Development Horror Stories BOF

I think people enjoyed the BOF and this can work in pretty much everywhere. I’ll submit it in the future to other conferences. BOF’s don’t really need slides, but we did some anyway:

Java EE 7 Batch Processing in the Real World [CON2818]

with Ivan Ivanov
This session was the first one of the day at 8.30 in the morning and was packed with people. It was surprising to see so many so early. Me and Ivan started the session with an introduction on Batch, origins, applications and so on. Next we went through the JSR-352 API to prepare for our demo at the end. The demo is based around World of Warcraft and we used the Batch API to download, process and extract metrics from the game Auction House’s (they are like eBay in the game). Stay tuned for a future post describing the entire sample.

Batch Processing Real World Session

Unfortunately we run out of time and we couldn’t show everything that we wanted, or at least go into more details about the demo. We allowed people to ask questions anytime, and we had a lot o them. I’m not complaining about it. I prefer doing it this way, since it makes the session more interactive. On the other hand, you end up using more time and is not very predictable. We will reorganize the session to perform the demo in the middle and everything should be fine like that.

And the check the session code here.

CON4255 – The 5 people in your organization that grow legacy code

I’m pretty happy with how this session go. Considering that it was the last day of the conference and also one of the last sessions of the day, I had probably around 80+ people. I’m also happy because it was video recorded, so I can check it properly later.

Legacy Code Session

I’m not going to spoil the content, but I think the attendees really enjoyed the session and had many moments to laugh about the content. I’ll just leave you with the slides:

Final Words

The event was huge, so I’m probably writing another post about it, since I don’t want to write a very long boring post. Next one is going to focus a little more on other sessions, activities and community!

I would like to thank everyone that attended my sessions and send a few specials ones: to Reza Rahman for helping me in the submission process, to Heather VanCura for the Hackergarten invite and for my co-speakers Ivan Ivanov and Simon Maple. Thanks everyone!

Maven Common Problems and Pitfalls

posted by Roberto Cortez on
tags:

Love it or hate it (and a lot of people seem to hate it), Maven is a widely used tool by 64% of Java developers (source – Java Tools and Technologies Landscape for 2014).

Most experienced developers already got their share of Maven headaches. Usually in the hard way, banging with their head into a brick wall. Unfortunately, I feel that new developers are going through the same hard learning process.

Looking into the main Java conferences around the world, you cannot find any Maven related sessions that guide you through the fundamentals. Maybe the community assumes that you should already know them, like the Java language itself. Still, recycling this knowledge could be a win-win situation for everyone. How much time do you or your teammates waste with not knowing how to deal with Maven particularities?

If you are reading this, I’m also going to assume that you grasp Maven basics. If not, have a look into the following articles:

There a lot of other articles. I see no value in adding my own, repeating the same stuff, but if I feel the need I may write one. Let me know if you support it!

Anyway, I think I can add some value by pointing out the main issues that teams came across when using Maven, explain them and how to fix them.

Why is this jar in my build?

Due to Maven transitive dependencies mechanism, the graph of included libraries can quickly grow quite large.

If you see something in your classpath, and you didn’t put it there, most likely is because of a transitive dependency. You might need it or maybe not. Maybe the part of the code of the library you’re using doest not required all those extra jars. It feels like a gamble here, but you can have a rough idea if you use mvn dependency:analyze. This command will tell you which dependencies are actually in use by your project.

I mostly do trial and error here, exclude what I think that I don’t need and run the code to see if everything is OK. Unfortunately, this command doesn’t go so far to tell you if the transitive dependencies are really needed for the dependencies that you are using. Hey, if someone knows a better way, let me know!

I can’t see my changes!

This can happen because of multiple reasons. Let’s look into the most common:

Dependencies are not built in the local repository

You may have Module A and Module B. Module B has a dependency to Module A. The changes you made to Module B are not visible in Module A.

This happens, because Maven look into it’s own local jar repository to include in the classpath. If you make any changes, you need to place a copy of new jar into the local repository. You do that by running mvn install in the changed project.

Dependency version is not correct

This can be so simply as to change the version of the dependency that you are using, or a real pain to figure it out. When Maven performs the dependency lookup, it uses the rule Nearest Definition First. This means that the version used will be the closest one to your project in the tree of dependencies. Confused? So do I. Let’s try an example.

You want to use dependency Dv1 in your project A, but you’re getting Dv2, and you have the following dependency tree:

A -> B -> C -> Dv1

A -> E -> Dv2

Which dependency of D is included? Dv1 or Dv2? In the case Dv2 because of the Nearest Definition First rule. If two dependency versions are at the same depth in the dependency tree, it’s the order in the declaration that counts.

To fix this problem you could explicitly add a dependency to Dv1 in A to force the use of Dv1 or just exclude Dv2.

If you use the command mvn dependency:tree it will output a tree will all the dependencies and versions for the project. This is very helpful to debug these kind of problems.

Remote repository has overwritten your changes

It’s usual for companies to have an internal Maven repository, to cache artifacts, store releases or serve the latest changes of the project you are working on. This works great most of the time, but when you’re working with SNAPSHOT versions, Maven is always trying to pick up the latest changes to that dependency.

Now, you are happily working on your Project B changes which has a dependency to Project A. You build everything locally and proceed to integrate the changes in Project A. Someone or something, upload a new SNAPSHOT version of Project B. Remember, your changes are not visible yet, since you have everything locally and did not commit to VCS yet. The next build you make of Project A it’s going to pick the Project B from the company repository and not the one in your local repository.

The jar is not included in the distribution!

To add a little more confusion, let’s talk about scopes. Maven has four scopes: compile, provided, runtime and test. Each dependency has a scope and the scope defines a different classpath for your application.

If you are missing something, and assuming that you have the dependency defined correctly, most likely the problem is in the scope. Use the compile scope to be on the safe side (which is the default). The commands mvn dependency:analyze and mvn dependency:tree can also help you here.

The artifact was not found!

Ahh, the dreaded “Could not resolve dependencies … Could not find artifact”. This is like the Java NPE! There are many reasons for why this happens. A few more evident that others, but a pain to debug anyway. I usually follow this checklist to try to fix the problem:

  • Check that the dependency is defined correctly
  • Check if you are pointing to the correct remote repositories that store the dependency
  • Check if the remote repository actually holds the dependency!
  • Check if you have the most recent pom.xml files
  • Check if the jar is corrupted
  • Check if the company repository is caching the internet repositories and didn’t not issue a request to get the new libraries
  • Check if the dependency definition is being overridden by something. Use mvn help:effective-pom for the actual maven setting building the project
  • Don’t use -o

Conclusion

Maven is not a perfect tool, but if you learn a few of tricks it will help you and save time debugging build problems. There are other that fix a few of these problems, but I don’t have enough knowledge to be able to voice my opinion about them.

Anyway, a big chuck of projects use Maven as a build tool and I believe that developers should know about their build tool to be able to perform better in their everyday work. Hopefully this post can be useful to you.

Feel free to post any other problem not covered here. Unfortunately, Maven sometimes seems a box full of surprises.

One last advice: Never trust the IDE! If it works on the command-line then it’s an IDE problem!

Five Ways to Not Suck at Being a Java Freelancer at Geecon – Kraków 2014

posted by Roberto Cortez on

As promised, here is the video about my session at Geecon – Kraków 2014 about Five Ways to Not Suck at Being a Java Freelancer.

I think you can tell that I’m very nervous during the first few minutes, but I was able to calm down a bit afterwards. I do hate to hear me speak, since the voice I hear does not sound like mine. In fact, no one sounds like they hear and where is why. Anyway, I should stop with so many “eeehhhmm” and “so”. I have to improve that.

Here are the slides as well:

And my latest article about freelance: FAQ for Freelancers

How to deal with Developers that grow your legacy code

posted by Roberto Cortez on

Today, I’ve published the follow up article Back to the future (again): How to reduce legacy code threats before they happen to the The 5 people in your organization that grow legacy code on RebelLabs.

After the success of the original post and the requests of people asking about techniques to deal with the profiles, me and RebelLabs decided to go for the follow-up. I hope that the newest article can answer all of your questions! If not, you know how to reach me. Feel free to contact me and ask me your questions.

A special thanks to RebelLabs for letting me publish my work there and of course to Oliver White for all the support on writing and reviewing the article. You rock!