Java EE 7 Batch Processing and World of Warcraft – Part 2

posted by Roberto Cortez on
tags: ,

Today, I bring you the second part to my previous post about Java EE 7 Batch Processing and World of Warcraft – Part 1. In this post, we are going to see how to aggregate and extract metrics from the data that we obtained in Part 1.

World of Warcraft Horde Auction House

Recap

The batch purpose is to download the World of Warcraft Auction House’s data, process the auctions and extract metrics. These metrics are going to build a history of the Auctions Items price evolution through time. In Part 1, we already downloaded and inserted the data into a database.

The Application

Process Job

After adding the raw data into the database, we are going to add another step with a Chunk style processing. In the chunk we’re are going to read the aggregated data, and then insert it into another table in the database for easy access. This is done in the process-job.xml:

A Chunk reads the data one item at a time, and creates chunks that will be written out, within a transaction. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

ProcessedAuctionsReader

In the reader, we are going to select and aggregate metrics using database functions.

For this example, we get the best performance results by using plain JDBC with a simple scrollable result set. In this way, only one query is executed and results are pulled as needed in readItem. You might want to explore other alternatives.

Plain JPA doesn’t have a scrollable result set in the standards, so you need to paginate the results. This will lead to multiple queries which will slow down the reading. Another option is to use the new Java 8 Streams API to perform the aggregation operations. The operations are quick, but you need to select the entire dataset from the database into the streams. Ultimately, this will kill your performance.

I did try both approaches and got the best results by using the database aggregation capabilities. I’m not saying that this is always the best option, but in this particular case it was the best option.

During the implementation, I’ve also found a bug in Batch. You can check it here. An exception is thrown when setting parameters in the PreparedStatement. The workaround was to inject the parameters directly into the query SQL. Ugly, I know…

ProcessedAuctionsProcessor

In the processor, let’s store all the aggregated values in a holder object to store in the database.

Since the metrics record an exact snapshot of the data in time, the calculation only needs to be done once. That’s why we are saving the aggregated metrics. They are never going to change and we can easily check the history.

If you know that your source data is immutable and you need to perform operations on it, I recommend that you persist the result somewhere. This is going to save you time. Of course, you need to balance if this data is going to be accessed many times in the future. If not, maybe you don’t need to go through the trouble of persisting the data.

ProcessedAuctionsWriter

Finally we just need to write the data down to a database:

Metrics

Now, to do something useful with the data we are going to expose a REST endpoint to perform queries on the calculated metrics. Here is how:

If you remember a few details of Part 1 post, World of Warcraft servers are called Realms. These realms can be linked with each other and share the same Auction House. To that end, we also have information on how the realms connect with each other. This is important, because we can search for an Auction Item in all the realms that are connected. The rest of the logic is just simple queries to get the data out.

During development, I’ve also found a bug with Eclipse Link (if you run in Glassfish) and Java 8. Apparently the underlying Collection returned by Eclipse Link has the element count set to 0. This doesn’t work well with Streams if you try to inline the query call plus a Stream operation. The Stream will think that it’s empty and no results are returned. You can read a little more about this here.

Interface

I’ve also developed a small interface using Angular and Google Charts to display the metrics. Have a look:

WoW Auctions Search

In here, I’m searching in the Realm named “Aggra (Português)” and the Auction Item id 72092 which corresponds to Ghost Iron Ore. As you can see, we can check the quantity for sale, bid and buyout values and price fluctuation through time. Neat? I may write another post about building the Web Interface in the future.

Resources

You can clone a full working copy from my github repository and deploy it to Wildfly or Glassfish. You can find instructions there to deploy it:

World of Warcraft Auctions

Check also the Java EE samples project, with a lot of batch examples, fully documented.

Share with others!
  • Twitter
  • Facebook
  • LinkedIn
  • Google Plus
  • Reddit
  • Add to favorites
  • Email
  • RSS

Comments ( 12 )

  1. ReplyKyle

    Thanks for posting. is this ItemReader running fine on GlassFish without any special implementation or configuration? on WildFly, I can’t simply implement JDBC ItemReaders this way which is using cursors across multiple transactions. in my understanding, it requires a property “jberet.local-tx=true” to job properties and batch developer should handle transaction manually. there were some pointers: a discussion and JIRA of WildFly about it.

    • ReplyRoberto Cortez

      Hi Kyle,

      Thank you for your comment.

      I didn’t have any problems running this example on Wildfly of Glassfish. It works on both of them. Are you using HOLD_CURSORS_OVER_COMMIT on the PreparedStatement? By the way, JBeret has an implementation on JDBC Reader and they’re doing pretty much the same thing. Look here: JdbcItemReader.java

      • ReplyKyle

        Yes, I used HOLD_CURSORS_OVER_COMMIT and many warning message of ARJUNA016087 were appeared to log. have you ever seen that?

        Thanks, I didn’t know that example. also I found a guide about it.

        • ReplyRoberto Cortez

          Hi Kyle,

          Strange. I never had that kind of problem with Batch. Did you try it with the JdbcItemReader? Same problem?

          • Kyle

            Hello Roberto, I didn’t try JdbcItemReader yet, but I deployed your wow-auctions on my environment and did some investigation and finally found why.

            It seems to that older WildFly dependent problem. my problem was occur in older WildFly (8.1.0.CR1) but I did’t see any warnings on latest 8.2.0.Final.

            On first try with 8.1.0.CR1, as you said, there was no ARJUNA016087 on log. but Connection which acquired in ProcessedAuctionsReader#open() was left unclosed so I added DbUtils.closeQuietly(connection) to ProcessedAuctionsReader#close() then ARJUNA016087 is appeared.

            But still I don’t know why that warning disappeared on latest WildFly because the batch spec says each open, read, write and close needs to be in it’s own transaction (this discussion is good to know further).

          • Roberto Cortez

            Hi Kyle,

            Yes, I forgot to close the Connection, so maybe thats the reason why I couldn’t see the problem. But, this doesn’t happen anymore on the latest version, right?

            I think that since the query is read only data and you don’t try to commit or rollback the transaction, it won’t show the warning, but this is only speculating. I’ll try to have a better look in the next few days.

            Thank you for the PR :)

          • Kyle

            Exactly. I also think a chapter named “11.6 Regular Chunk Processing” in the spec of JSR352 would be a good reference. thanks again :)

  2. ReplyKyle

    Hi Roberto, recently I got the correct solution to the issue which we argued here, about JDBC ItemReaders that using a cursor across multiple transactions on EE environment, through this discussion: https://developer.jboss.org/message/916629 , so let me share it.

    In such case, we can simply use a non-JTA datasource instead of JTA one to avoid the issue. for example, in WildFly, it can be defined with the option “–jta==false” . it will never join to JTA transactions, so no commits will happen to it. additionally, we even don’t need to use HOLD_CURSORS_OVER_COMMIT.

    p.s. congrats to joining Tomitribe :)

    • ReplyRoberto Cortez

      Hi Kyle,

      Thank you very much for sharing your findings here in my blog. Really appreciate that :)

      And thanks for the Tomitribe wishes :)

  3. ReplyKyle

    Sorry, the option is “–jta=false”, not “–jta==false” 😉

  4. Replyshan

    hi,

    while running the above program…
    I encounter an Exception…

    RESULTSET closed?
    at readItem() method…

    can help me
    ?

    • ReplyRoberto Cortez

      Hi Shan,

      Sorry for the late reply.

      Are you able to post me a full stacktrace?

      Cheers,
      Roberto

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>