Results for tag "batch"

Java EE 7 Batch Processing and World of Warcraft – Part 2

posted by Roberto Cortez on
tags: ,

Today, I bring you the second part to my previous post about Java EE 7 Batch Processing and World of Warcraft – Part 1. In this post, we are going to see how to aggregate and extract metrics from the data that we obtained in Part 1.

World of Warcraft Horde Auction House

Recap

The batch purpose is to download the World of Warcraft Auction House’s data, process the auctions and extract metrics. These metrics are going to build a history of the Auctions Items price evolution through time. In Part 1, we already downloaded and inserted the data into a database.

The Application

Process Job

After adding the raw data into the database, we are going to add another step with a Chunk style processing. In the chunk we’re are going to read the aggregated data, and then insert it into another table in the database for easy access. This is done in the process-job.xml:

A Chunk reads the data one item at a time, and creates chunks that will be written out, within a transaction. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

ProcessedAuctionsReader

In the reader, we are going to select and aggregate metrics using database functions.

For this example, we get the best performance results by using plain JDBC with a simple scrollable result set. In this way, only one query is executed and results are pulled as needed in readItem. You might want to explore other alternatives.

Plain JPA doesn’t have a scrollable result set in the standards, so you need to paginate the results. This will lead to multiple queries which will slow down the reading. Another option is to use the new Java 8 Streams API to perform the aggregation operations. The operations are quick, but you need to select the entire dataset from the database into the streams. Ultimately, this will kill your performance.

I did try both approaches and got the best results by using the database aggregation capabilities. I’m not saying that this is always the best option, but in this particular case it was the best option.

During the implementation, I’ve also found a bug in Batch. You can check it here. An exception is thrown when setting parameters in the PreparedStatement. The workaround was to inject the parameters directly into the query SQL. Ugly, I know…

ProcessedAuctionsProcessor

In the processor, let’s store all the aggregated values in a holder object to store in the database.

Since the metrics record an exact snapshot of the data in time, the calculation only needs to be done once. That’s why we are saving the aggregated metrics. They are never going to change and we can easily check the history.

If you know that your source data is immutable and you need to perform operations on it, I recommend that you persist the result somewhere. This is going to save you time. Of course, you need to balance if this data is going to be accessed many times in the future. If not, maybe you don’t need to go through the trouble of persisting the data.

ProcessedAuctionsWriter

Finally we just need to write the data down to a database:

Metrics

Now, to do something useful with the data we are going to expose a REST endpoint to perform queries on the calculated metrics. Here is how:

If you remember a few details of Part 1 post, World of Warcraft servers are called Realms. These realms can be linked with each other and share the same Auction House. To that end, we also have information on how the realms connect with each other. This is important, because we can search for an Auction Item in all the realms that are connected. The rest of the logic is just simple queries to get the data out.

During development, I’ve also found a bug with Eclipse Link (if you run in Glassfish) and Java 8. Apparently the underlying Collection returned by Eclipse Link has the element count set to 0. This doesn’t work well with Streams if you try to inline the query call plus a Stream operation. The Stream will think that it’s empty and no results are returned. You can read a little more about this here.

Interface

I’ve also developed a small interface using Angular and Google Charts to display the metrics. Have a look:

WoW Auctions Search

In here, I’m searching in the Realm named “Aggra (Português)” and the Auction Item id 72092 which corresponds to Ghost Iron Ore. As you can see, we can check the quantity for sale, bid and buyout values and price fluctuation through time. Neat? I may write another post about building the Web Interface in the future.

Resources

You can clone a full working copy from my github repository and deploy it to Wildfly or Glassfish. You can find instructions there to deploy it:

World of Warcraft Auctions

Check also the Java EE samples project, with a lot of batch examples, fully documented.

Sixth Coimbra JUG Meeting – Batch Processing in the Real World

posted by Roberto Cortez on

Last Thursday, 30 October 2014, the sixth meeting of Coimbra JUG was held on the Department of Informatics Engineering of the University of Coimbra, in Portugal. The attendance was good, we had around 25 people to listen to my talk about the Java EE Batch. This is the same session I have presented at JavaOne.

Coimbra JUG Meeting 6 Audience

No one in the audience was using Java EE Batch, and only a couple were using Spring Batch technologies. By coincidence a few old colleagues of mine, working in a project where I introduced the technology. The attendees seemed very curious and interested to learn about it and the questions were great. A lot of interactions and discussions were generated during the session. A funny thing happened toward the end: there was a power failure and we had to finished the session on the dark! We were lucky, since the session was almost done. Discussions about the topic (and others), bounced to the dinner. We had around ten enthusiasts, for our biggest dinner ever!

As always, we had surprises for the attendees: beer and chocolates, if you participated in the discussion. IntelliJ sponsored our event, by offering a free license to raffle among the attendees. Congratulations to Décio Sousa for winning the license. Develop with pleasure!

Here are the materials for the session:

Enjoy!

A few additional notes:

I would like to welcome Bruno Baptista to the Coimbra JUG Organization. He is going to help me running the JUG.

Coimbra JUG is almost 1 year old! Let’s see if we can pull something interesting to commemorate!

Java EE 7 Batch Processing and World of Warcraft – Part 1

posted by Roberto Cortez on
tags: ,

This was one of my sessions at the last JavaOne. This post is going to expand the subject and look into a real application using the Batch JSR-352 API. This application integrates with the MMORPG World of Warcraft.

Since the JSR-352 is a new specification in the Java EE world, I think that many people don’t know how to use it properly. It may also be a challenge to identify the use cases to which this specification apply. Hopefully this example can help you understand better the use cases.

Abstract

World of Warcraft is a game played by more than 8 million players worldwide. The service is offered by region: United States (US), Europe (EU), China and Korea. Each region has a set of servers called Realm that you use to connect to be able to play the game. For this example, we are only looking into the US and EU regions.

World of Warcraft Horde Auction House

One of the most interesting features about the game is that allows you to buy and sell in-game goods called Items, using an Auction House. Each Realm has two Auction House’s. On average each Realm trades around 70.000 Items. Let’s crunch some numbers:

  • 512 Realm’s (US and EU)
  • 70 K Item’s per Realm
  • More than 35 M Item’s overall

The Data

Another cool thing about World of Warcraft is that the developers provide a REST API to access most of the in-game information, including the Auction House’s data. Check here the complete API.

The Auction House’s data is obtained in two steps. First we need to query the correspondent Auction House Realm REST endpoint to get a reference to a JSON file. Next we need to access this URL and download the file with all the Auction House Item’s information. Here is an example:

http://eu.battle.net/api/wow/auction/data/aggra-portugues

The Application

Our objective here is to build an application that downloads the Auction House’s, process it and extract metrics. These metrics are going to build a history of the Items price evolution through time. Who knows? Maybe with this information we can predict price fluctuation and buy or sell Items at the best times.

The Setup

For the setup, we’re going to use a few extra things to Java EE 7

Jobs

The main work it’s going to be performed by Batch JSR-352 Jobs. A Job is an entity that encapsulates an entire batch process. A Job will be wired together via a Job Specification Language. With JSR-352, a Job is simply a container for the steps. It combines multiple steps that belong logically together in a flow.

We’re going to split the business login into three jobs:

  • Prepare – Creates all the supporting data needed. List Realms, create folders to copy files.
  • Files – Query realms to check for new files to process.
  • Process – Downloads the file, process the data, extract metrics.

The Code

Back-end – Java EE 7 with Java 8

Most of the code is going to be in the back-end. We need Batch JSR-352, but we are also going to use a lot of other technologies from Java EE: like JPA, JAX-RS, CDI and JSON-P.

Since the Prepare Job is only to initialize application resources for the processing, I’m skipping it and dive into the most interesting parts.

Files Job

The Files Job is an implementation of AbstractBatchlet. A Batchlet is the simplest processing style available in the Batch specification. It’s a task oriented step where the task is invoked once, executes, and returns an exit status. This type is most useful for performing a variety of tasks that are not item-oriented, such as executing a command or doing file transfer. In this case, our Batchlet is going to iterate on every Realm make a REST request to each one and retrieve an URL with the file containing the data that we want to process. Here is the code:

A cool thing about this is the use of Java 8. With parallelStream() invoking multiple REST request at once is easy as pie! You can really notice the difference. If you want to try it out, just run the sample and replace parallelStream() with stream() and check it out. On my machine, using parallelStream() makes the task execute around 5 or 6 times faster.

Update
Usually, I would not use this approach. I’ve done it, because part of the logic involves invoking slow REST requests and parallelStreams really shine here. Doing this using batch partitions is possible, but hard to implement. We also need to pool the servers for new data every time, so it’s not terrible if we skip a file or two. Keep in mind that if you don’t want to miss a single record a Chunk processing style is more suitable. Thank you to Simon Martinelli for bringing this to my attention.

Since the Realms of US and EU require different REST endpoints to invoke, these are perfect to partitioned. Partitioning means that the task is going to run into multiple threads. One thread per partition. In this case we have two partitions.

To complete the job definition we need to provide a JoB XML file. This needs to be placed in the META-INF/batch-jobs directory. Here is the files-job.xml for this job:

In the files-job.xml we need to define our Batchlet in batchlet element. For the partitions just define the partition element and assign different properties to each plan. These properties can then be used to late bind the value into the LoadAuctionFilesBatchlet with the expressions #{partitionPlan['region']} and #{partitionPlan['target']}. This is a very simple expression binding mechanism and only works for simple properties and Strings.

Process Job

Now we want to process the Realm Auction Data file. Using the information from the previous job, we can now download the file and do something with the data. The JSON file has the following structure:

The file has a list of the Auction’s from the Realm it was downloaded from. In each record we can check the item for sale, prices, seller and time left until the end of the auction. Auction’s are algo aggregated by Auction House type: Alliance and Horde.

For the process-job we want to read the JSON file, transform the data and save it to a database. This can be achieved by Chunk Processing. A Chunk is an ETL (Extract – Transform – Load) style of processing which is suitable for handling large amounts of data. A Chunk reads the data one item at a time, and creates chunks that will be written out, within a transaction. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

ItemReader

The real files are so big that they cannot be loaded entirely into memory or you may end up running out of it. Instead we use JSON-P API to parse the data in a streaming way.

To open a JSON Parse stream we need Json.createParser and pass a reference of an inputstream. To read elements we just need to call the hasNext() and next() methods. This returns a JsonParser.Event that allows us to check the position of the parser in the stream. Elements are read and returned in the readItem() method from the Batch API ItemReader. When no more elements are available to read, return null to finish the processing. Note that we also implements the method open and close from ItemReader. These are used to initialize and clean up resources. They only execute once.

ItemProcessor

The ItemProcessor is optional. It’s used to transform the data that was read. In this case we need to add additional information to the Auction.

ItemWriter

Finally we just need to write the data down to a database:

The entire process with a file of 70 k record takes around 20 seconds on my machine. I did notice something very interesting. Before this code, I was using an injected EJB that called a method with the persist operation. This was taking 30 seconds in total, so injecting the EntityManager and performing the persist directly saved me a third of the processing time. I can only speculate that the delay is due to an increase of the stack call, with EJB interceptors in the middle. This was happening in Wildfly. I will investigate this further.

To define the chunk we need to add it to a process-job.xml file:

In the item-count property we define how many elements fit into each chunk of processing. This means that for every 100 the transaction is committed. This is useful to keep the transaction size low and to checkpoint the data. If we need to stop and then restart the operation we can do it without having to process every item again. We have to code that logic ourselves. This is not included in the sample, but I will do it in the future.

Running

To run a job we need to get a reference to a JobOperator. The JobOperator provides an interface to manage all aspects of job processing, including operational commands, such as start, restart, and stop, as well as job repository related commands, such as retrieval of job and step executions.

To run the previous files-job.xml Job we execute:

Note that we use the name of job xml file without the extension into the JobOperator.

Next Steps

We still need to aggregate the data to extract metrics and display it into a web page. This post is already long, so I will describe the following steps in a future post. Anyway, the code for that part is already in the Github repo. Check the Resources section.

Resources

You can clone a full working copy from my github repository and deploy it to Wildfly. You can find instructions there to deploy it.

World of Warcraft Auctions

Check also the Java EE samples project, with a lot of batch examples, fully documented.

Java One 2014 – Create the Future


JavaOne
I spent the last week in San Francisco to attend JavaOne 2014. This was my third time attending JavaOne, so I was already familiarized with the conference. Anyway, this year was different since I was going as a speaker for the first time.

Create the Future

“Create the Future” was the theme of JavaOne this year. The last few years have been very exciting for the Java community. After many years without evolution, we see now Java 8 with lambdas and streams, Java EE 7 with new specifications and simplifications)and a huge effort to unify and support Java for embeddable devices. Java 9 is already in the pipeline which promises modular Java (project Jigsaw). Java EE 8 is going to improve a lot of specifications and bring new ones like MVC, JSON-B and the much awaited JCache. Now it’s the time to contribute by Adopting a JSR.

During the last few years we heard a lot of voices claiming that Java is dead. Looking at what’s happening now, it doesn’t seem that way. The platform is evolving, a lot of new developers are joining the JVM ecosystem, and the conference was vibrating with energy. By the way, Java is turning 20 years in 2015. Let’s see what is going to happen in 20 years from now. Let’s hope that this blog is still around!

Keynote

The opening Keynote was a recap on what’s happening in the last few years. You can find all the videos here. Just a few notes:

  • Coimbra JUG shows on the map of the new JUG’s:

    JavaOne - Coimbra JUG

  • The technical Keynote was interrupted because of lack of time. This also happened to me in one of my sessions. I understand that there is a time frame, but this was not the best way to kick out the conference. I’m pretty sure that most attendees would prefer to shorten up the Strategy Keynote for the Technical one.
  • I was referenced in the Community Keynote, because of my work at the Java EE 7 Hackergarten. Thank you Heather VanCura. Count me in with future contributions!

Venue

The event was split between the Moscone Center, The Hilton Hotel and the Parc 55 Hotel. I’m not from the time where JavaOne was completely held in the Moscone Center, so I can’t compare. Because of the layout of the hotels, you need to run sometimes from session to session and the corridors are not the best place to have groups of people chatting. A few of the rooms also have columns in the middle which makes difficult for the attendees and the speaker to be aware of everything.

In my session Development Horror Stories [BOF4223] I had to run with Simon Maple, to get there on time. The problem was that the previous slot sessions were held at the Hilton and then moved to the Moscone, which is a 15 minutes walk. By the way, no taxi wanted to take us because it was too close.

Food

Not even going to comment about it. Yeah the lunch sucked, and yeah I’m weird with the food.

Sessions

There is so much stuff going on, that it’s impossible to attend every session that you want to go. I probably only attended half of the sessions that I’ve signed up for. I had to split some of my time between the sessions, the Demogrounds, the Hackergarten and also a bit of personal time for the last details of my sessions. Not all sessions had video recording, but all of them should have audio and be available via Parleys.

These are my top 3 sessions (from the ones I have attended):

My Sessions

I’m relatively happy with my performance delivering the sessions, but I can improve much more. I do have to say, that I didn’t feel any nervousness. I guess that I’m feeling more comfortable on public speaking, plus preparing everything with a few weeks in advance also helped. Moving forward!

Development Horror Stories [BOF4223]

with Simon Maple
We had around 150+ people signed up, but only 50 or so showed up. I think this was related to the switch venues problem I described earlier. At the same time there was also an Oracle Tech Party with food, drinks and music. I guess that didn’t help either.

Anyway, me and Simon kicked out the BOF with a few of our own stories where things went terribly wrong. The crowd was really into it, so our plan to ask people for the audience to share their own stories worked perfectly. We probably had around 10+ people stepping up the stage. In the end we had a Java 8 In Action book give away signed by the author, for the best story voted by the audience. The winning story belong to Jan when he wrote a few scripts to clear and insert data into a database for tests. Unfortunately he executed it in a production environment by accident!

Development Horror Stories BOF

I think people enjoyed the BOF and this can work in pretty much everywhere. I’ll submit it in the future to other conferences. BOF’s don’t really need slides, but we did some anyway:

Java EE 7 Batch Processing in the Real World [CON2818]

with Ivan Ivanov
This session was the first one of the day at 8.30 in the morning and was packed with people. It was surprising to see so many so early. Me and Ivan started the session with an introduction on Batch, origins, applications and so on. Next we went through the JSR-352 API to prepare for our demo at the end. The demo is based around World of Warcraft and we used the Batch API to download, process and extract metrics from the game Auction House’s (they are like eBay in the game). Stay tuned for a future post describing the entire sample.

Batch Processing Real World Session

Unfortunately we run out of time and we couldn’t show everything that we wanted, or at least go into more details about the demo. We allowed people to ask questions anytime, and we had a lot o them. I’m not complaining about it. I prefer doing it this way, since it makes the session more interactive. On the other hand, you end up using more time and is not very predictable. We will reorganize the session to perform the demo in the middle and everything should be fine like that.

And the check the session code here.

CON4255 – The 5 people in your organization that grow legacy code

I’m pretty happy with how this session go. Considering that it was the last day of the conference and also one of the last sessions of the day, I had probably around 80+ people. I’m also happy because it was video recorded, so I can check it properly later.

Legacy Code Session

I’m not going to spoil the content, but I think the attendees really enjoyed the session and had many moments to laugh about the content. I’ll just leave you with the slides:

Final Words

The event was huge, so I’m probably writing another post about it, since I don’t want to write a very long boring post. Next one is going to focus a little more on other sessions, activities and community!

I would like to thank everyone that attended my sessions and send a few specials ones: to Reza Rahman for helping me in the submission process, to Heather VanCura for the Hackergarten invite and for my co-speakers Ivan Ivanov and Simon Maple. Thanks everyone!

My Sessions at JavaOne 2014


Last week, JavaOne 2014 published the sessions schedules plus the Schedule Builder for attendees to enrol in the sessions. I’m going to be speaking in the following sessions:

If you’re going, please sign-up for these sessions. I’m going to do my best to make sure that your time is well spent there. Check my previous post with some additional information about the sessions: Speaking at JavaOne 2014.