Friday, December 19, 2008

The Minion Search and Project Aura

There are two very interesting Java open source projects from SUN which are worth looking at for anybody who is keen about information retrieval and artificial intelligence. The first project is a full text search engine called Minion and the second one is recommendation engine called Project Aura.
Both these projects have been presented at JavaOne 2008. You can find slides, video and audio materials here:
  • The Minion Search Engine - TS-5027, this presentation contains a nice high level comparison to Lucene, which is well established Java open source search engine.
  • The Project Aura - TS-5841, an alternative to a Taste project which has been merged with a Mahout project.
For more information you can check blog of Steve Green, one of the authors of Minion and Aura.

Saturday, November 22, 2008

Google Developer Day 2008 - Video Presentations Available

Video presentations from Google Developer Day 2008 (held in Prague) are now available at the following URL:

Saturday, November 8, 2008

Future of Shopping

What is the future of shopping going to look like in the next five years? That is the question I am asking myself a lot now. Personally, I do believe that some kind of social shopping will dominate in few years across all kinds of possible communiaction platforms and channels. Two speakers, Robert Garf (Retail Strategies AMR Research) and Fred Balboni (IBM Global Business Services), share some thoughts about future shopping in IBM and the Future of Shopping podcast.

Their vision is that web shops, as we know them these days, will become more like a service centers:
... the major growth in retail we see happening is the rise of services. The services around your core retail experience. ... So as we see services increasing in retail, the store also becomes a place where you organize your services -- a service center, if you will.
And the physical shops will become more like a depot:
Getting the product in your hands to get it home, the store might shift to become possibly more of a depot where you go collect what it is that you've already browsed and selected. I think that's a real possibility in the future.
As for the technology that will make this possible Balboin says:
...the technology dimension of this really isn't the hard part.
and he follows:
The challenging thing for retailers is the business model to operationalize. How do I...if I understand that somebody in a certain market wants a certain product, how do I, in a cost-effective manner, get that product to them? The ability to manage from these large quantities of customer information I think is going to change retail over the next five years.
And I agree with this point. But even if the technology part is not that hard the question is if it would be effective for each retailer to invest into development, implementation and maintenance of this layer (I mean distributed social shop service center). And I think the answer is no. Moreover, looking from the customer point of view they might prefer to stick to independent service which gives them more freedom and product choices as opposed to specific retailer service implementation. And this leads me to conclusion that there is a possibility for a new business model, something like new Google or may be Facebook platform for retailers. A platform which would be simple enough so that individual retailers could easily feed the data in and compete for customers in more effective way and don't have to worry about the technology side of the thing... 

Thursday, October 30, 2008

New Business Product Idea for Sun

As a Java developer I really like what Sun is doing for me (and the list would be very long). I periodically regret to read stories about Sun's poor stock market results. One example of such news can be found here: Sun reports 1.7$ billion loss and falling sales.

I can hardly come up with perfect business plan for them but I have one idea though. Why they don't build a public cluster with Hadoop for rent? As far as I know Sun engineers are interested in Hadoop technology and cloud computing is a new mantra of many web services. If I were to build a new scalable service then I would probably end up using Hadoop at some point as well. The problem is that there is a shortage of public Hadoop compatible clouds (Amazon is probably the only option as of now, Google App Engine does not provide integration with Hadoop now) and Sun has all what's needed.

Wednesday, October 29, 2008

Google Developer Day 2008 - lessons learned

Google Developer Day (GDD) 2008 in Prague was very interesting. I focused on scalability and cloud computing sessions and I also managed to visit some Google Map and geo web sessions.

I had a chance to talk to Peter Kukol after his high level presentation and I was really interested if Google uses MapReduce (MR) for all of its massive data processing calculations and it seems that the answer is yes (more specifically the answer is that it is used a lot more often then before and the number of MR jobs is still growing). The interesting point is that there isn't any specific algorithm for which they would consider utilization of different (non MR) architecture. As far as I understand it can be challenging to operate calculation on top of some specific data structures like graphs or trees in MR but it seems that Google has enough resources and can spend some extra cycles of its data centers on non optimized calculations. When it comes to contorl and optimization of MR jobs then the developers can use a lot of visual tools (it would be nice to know more about this).

Also it seems that Google App Engine does not provide support for long running background MR processes now and if it will in the future then there is a little chance that it would directly integrate with Hadoop implementation. This is a pitty because I can imagine that this would allow developers to switch between App Engine and Amazon or other cloud provider very easily.

Anyway, the service in the Clarion hotel was excellent the the meal was delitions! Thanks Google, I hope to be there the next year. All the presentations were recorded on a video and should be available on the youtube in the future.

Some photos:

Opening speach by Jan Šedivý

All the speakers

Relax room exhibition by 3rojka

After party "U Vejvodů"

Thursday, October 23, 2008

Google Developer Day 2008

I am going to attend Google Developer Day 2008 in Prague tomorrow (feel free to drop me an email [lukas dot vlcek at gmail dot com] in case you would like to meet me). Agenda is full of interesting stuff: Maps and Earth API, App Engine, Open Social Web, GWT, Gadgets... etc. It is clear that I won't be able to track all sessions but I am specifically looking forwar to Peter Kukol's session about large-scale computing architecture including Hadoop and HDFS.

Tuesday, October 21, 2008

Rich Client Programming - NetBeans Platform

I like the NetBeans platform and I have been using it for rich client application development for more then year now. When it comes to learning NetBeans platform API I can recommend book called Rich Client Programming, plugging into the NetBeans platform. I had a chance to study this book quite shortly after it had been released and I also had a pleasure to attend an excellent NetBeans training course held in Prague on 6-7, March. Quite recently I also won this book in NetBeans Puzzler. Folks in SUN are not only giving a lot of high quality SW out for free but they are also doing an outstanding job when it comes to education. Thanks!

I hope to post more about my experience with NetBeans platform development in the future.

Other useful NetBeans platform resources:
Geertjan's blog:

Tuesday, October 14, 2008

Hadoop User Group UK Meeting

Hadoop User Group (HUG for short) was held on August 19th, 2008 in the UK. You can find short video presentations and slides here:

Found via blog entry.

Wednesday, October 8, 2008

Maven 2 - Easy Dependency Management for Production

In this post I will cover basic examples of the following maven 2 plugins:
  • maven-dependency-plugin
  • maven-jar-plugin
  • maven-assembly-plugin.
I have been using Maven 1.x for a long time. Now, I am more working with MVN (aka Maven 2) and I found that with MVN it is very easy to prepare distribution package including all dependencies. More specifically, I found two nice approaches how to handle CLASSPATH dependencies during mvn package phase.

The first approach:
maven-dependency-plugin and maven-jar-plugin

As a result of utilization of these two plugins you can have your application classes packed into a jar file and a lib folder next to it which contains all the needed jar libraries. Moreover, the application jar will contain references to all the jar libraries in the manifest. This means that you don't have to put libraries on the CLASSPATH explicitely as long as they are kept in the lib folder.


The second approach:

This plugin allows you to include any kind of resource into the application jar. In case of third-party libraries it means that it un-zips all dependent jar archives and copy compiled classes next to your application classes. Then it zips everything into a single jar file.

Related resources:
Original mail thread from maven mail archive: here

Sunday, September 28, 2008

How to deal with money these days?

Financial institutions in the USA are struggling now and many people are starting to looking for better understanding of financial world. And so do I as well. Which information resources should novices learn from while this topic is barely taught in schools?

One of the names I came across while searching the internet is Robert Kiyosaki. It is said [1] that he is an investor and businessman. Well, I had hard times to find any specific details about Robert's financial performance and results but one thing is clear, he is charismatic speaker and successful author. For example in one video [2] Robert claims that it has been sold over 26 millions copies of his book Rich Dad, Poor Dad [3]. He also designed a board game called CASHFLOW [4]. On the other hand his books are criticised for lack of useful advices by some people. For example John T. Reed [5] put together exhaustive critique review of Robert's book [6] (Robet's response to this article can be found here [7]).

In spite of the fact that Robert knows how to talk to masses I prefer different style. I specifically like the book called Investiční strategie pro třetí tisíciletí by Pavel Kohout [8]. This book has been reworked many times. Recently, the fifth revision has been published and I found it very interesting (it is published in czech language and I am not aware of english version). If you know any better book for beginners (does not have to be in czech), please let me know...

I am not to say "don't waste your time on Kiyosaki stuff" because at least his books contain a lot of motivation but I would recommend to read Kohout's book first. If you read Kiyosaki's books first then it could be too late when you get to Kohout's book. I don't think getting rich can be as easy as Robert says it is...

Monday, September 15, 2008

Jára Cimrman on LinkedIn - finally

Finally, Jára Cimrman has set up a profile at LinkedIn:

I guess this is going to be one of the hottest profiles on this site. It can definitely cast shadow on profiles of Bill Gates or Barack Obama very soon. Anyway, this may be a great way for him to be finally recognized by exclusive head hunters and get him job at Google or CIA...

Friday, September 5, 2008

Your music tastes can tell a lot about you

It's Friday so let's have some fun...
A Professor Adrian North [2] is conducting a study [1] in which he links music tastes to personality. After reading about this study at BBC portal [3] I gave it a try myself as well (don't expect to receive any evaluation after finishing the survey, it is just saved in database for later processing). You can find some general results of this study in a short press release [4].

Well, this makes me think that public music taste profile (like the one you can have in online radio) can be easily (mis)used by employer or assurance agent.

Thursday, August 28, 2008

MapReduce Integrated into Relational Database

There are two different approaches when it comes to processing of very large data set in a parallel and distributed fashion. One - the traditional - preferes the relational database while the other - strengthen by the Web 2.0 hype - prefers more constraint and data structure relaxed environment represented for example by MapReduce approach.

Apache's Hadoop projects is the only available production ready implementation of MapRecudce and it has captured a lot of attention and its popularity is growing fast. No wonder that these two worlds run into flamed and little productive discussions concerning "which one is better?" question.

Well, the interesting news is that two relational database engines are now providing integrated solution of both the worlds:
News found via

I am particularly curious how soon will be the Hadoop integrated into Oracle or MySQL database.

Monday, August 25, 2008

Donate to help Andrii Nikitin's son Ivan

I have just started looking at MySQL database a little closer and I found the following urgent message on the news page. One of the engineers involved in MySQL project is seeking for help as his son needs expensive operation: Donate to help Andrii Nikitin's son Ivan

I am spreading the fire... if you find yourself willing to help, don't hesitate to send even a small amount.

Writing scalable applications in Java

There can be found a lot of articles dealing with scalability and Java on the web. Two nice and high level informative articles about the topic for less experienced developers can be found at TSS server:

These articles does not discuss all possible solutions of this complex topic but they highlight some good open source alternatives. The discussion for each article is worth reading as well!

Another excellent (and more comprehensive) source of information is High Scalability blog.

Wednesday, July 16, 2008

Sunflowers - acrylic painting

My new acrylic painting called Sunflowers is finished.

First of all I made a photo reference and manipulated in Photoshop. Then I quickly sketched it on the canvas.

The basic skatch was done using charcoal and siena and burnt umber brown acrylic colors.

From that point I slowly started adding more colors and details.

Almost done...

And here is the final framed painting. I will post an updated once it is hunged on the wall.

I hope this will make a nice birthday present for my wife :-)

Tuesday, July 1, 2008

Powerset acquired by Microsoft

Powerset has been acquired by Microsoft. Except official announcement on Powerset blog you can find a lot of comments on news sites. But as far as I know very little is known what this acquisition means for Powerset home grown search technology and culture. It is known that Powerset is heavily using open source tools including Hadoop and what is more important, Powerset engineers are the leading developers of HBase. I am really curious about what is the Microsoft's plan for Java based open source and Powerset's engineers building it every day. Some of the comments below the official Microsoft announcement can be read for fun...

Another blog post at Jeff's Search Engine Caffè seems to be sharing similar kind of fear about HBase.

Saturday, June 28, 2008

Foxmarks - an excellent Google Browser Sync alternative

I have been using Google Browser Sync for a long time but it is no longer supported by Google now. I was looking for a good alternative and I have chosen Foxmarks. The reason why I picked Foxmarks over other products (like Google Bookmars) is that is allows very smooth integration with Sage (which is my favorite fead readed). And the reason why I prefer Sage over Google Reader is that it supports folders...

Sunday, June 22, 2008

Another *original* logo design found!

Today, I have added another piece into my "ON"-like logo collection. Taken from
They say "Original programming for Original peopleTM" and all I can add is that ... It is hard to create original logo.

Friday, June 20, 2008

TSSJS Prague 2008

I have just learned that I missed TSS Java Symposium in Prague this year!

Agenda looked pretty interesting. It seems that cloud, grid and concurrent computing is going to be the next buzz word which every developer has to add into CV soon next to Spring, JPA, Hibrnate, Grails and Groovy stack (tracks about Scala, MapReduce, DataGrids ...).

Well, I hope that I won't miss it next time and that at least the slides will be available soon. As of now I was able to find Shay Banon's Beyond the DataGrid (it specifically mentions Hadoop as well).

Wednesday, June 18, 2008 module (0.2) for NetBeans released

New version of module for NetBeans is out!

Main changes:
  • Redesigned GUI - now more stateful
  • Switched to trunk version of ascrblr library
  • Fixed some internal issues with threads
Release notes for new version can be found here:

Distribution file can be downloaded here:


Note: distribution package was build using Java 1.6 and NetBeans 6.1. If you need to use Java 1.5 or NetBeans 6.0 then you have to build the module from source code.

Tuesday, June 17, 2008

HSQLDB - Closing Database Connection

If you are using HSQLDB and you are getting exceptions like:

Could not get JDBC Connection; nested exception is java.sql.SQLException: The database is already in use by another process ...


java.lang.Exception: The process cannot access the file because another process has locked a portion of the file ...

then you probably need to close the database connection correctly.

Call the following method:

For example if you are using HSQLBD in your unit tests then it is necessary to call org.hsqldb.DatabaseManager.closeDatabases(0); in every tearDown() method in each unit test.

[1] Unit Testing of Database Applications
[2] How to close all connections in HSQLDB

Thursday, May 29, 2008

Learning Hadoop

I am learning Hadoop now and I found two nice tutorials describing how to setup and run Hadoop on Ubuntu:
Found via: blog post

Wednesday, May 28, 2008

Mahout Logo

Mahout is a name of open source project (Apache licence). The goal of this project is to build scalable machine learning libraries. It runs on top of Hadoop. While this project is quite new and it has started just few months ago there are already several interesting algorithms committed in the SVN repo (also you can check Grant Ingersoll's blog post for high level Mahout news).

Now, I am really happy to announce that Mahout development community accepted my proposal for the project logo and that it made it to the official web page today. I am pretty sure that Mahout (and thus the logo as well) will be around for a long time and it is going to be part of critical business intelligence processes of both commercial and research projects around the world in the future.

You can find more Mahout logo resources in Mahout-57 (both PNG and SVG formats are available).

Mahout logo design was done using Inkscape.

Check mahout pronunciation:

Tuesday, May 20, 2008

XSLT Questions and Answers

I found useful resource about XSLT. Page called XSLT Questions and Answers contains many documents in Q&A style. For example I was looking for some examples of XSLT variables definitions and usages and this page was really worth reading.

It seems that this comprehensive knowledge base is maintained by Dave Pawson (did you notice the bookcase on the right hand side of it - it's full of XSLT related books).

Tuesday, May 13, 2008

Thursday, April 24, 2008

The Future is for Elephans!

Distributed computing has been challenging the smartes brains for decades. Until recently, it has been locked away from ordinary IT engineers in geek labs. However, I believe that frameworks like Hadoop will be used a lot more in the near future.

I have found a few useful presentations about MapReduce (or Hadoop), Google File System (or HDFS), BigTable (or HBase) and more:

Wednesday, April 23, 2008

Minion - new Java search engine from Sun labs

It seems that there will be a new open source (GPLv2) search engine revealed for Java soon. It is called Minion and Stephen Green (aka The Search Guy) has just started blogging about it. He has already published few posts and it seems that Minion has some interesting concepts:
  • For example it can store the Date as Date (as opposed to Lucene which can store Date only as a String).
  • Also it seems to be conveniet for stream indexing (indexer is stateful).
  • Every document in index has an unique key and it never changes - updates of documents are then handled automatically under the hood (as far as I know Lucene can not offer such luxury).

Monday, April 21, 2008 module for NetBeans

I have released first alpha version of module for NetBeans platform a few hours ago. You can find more information here:
I know the code is not perfect now - feedback is welcomed!

How to install the module into NetBeans.


More about how I did this module later... (time for bed now!)

Tuesday, April 15, 2008

JFreeChart Examples

Looking for JFreeChart examples? I found the following link extremely useful:
Dozens of charts along with source code and preview picture.

Thursday, April 3, 2008

It is hard to create good and unique logo design

While I was searching for some stuff on the internet I came across the following add from Adobe:

It seems to be an Adobe ON AIR Tour Europe add. Normally the adds do not catch my attention but this one is an exception because the visual style is so close to ON Semiconductor corp (my employer) logo design:
Also other horizontal Adobe campaign logo seems to be interestingly close to ON Semiconductor design:

ON Semiconductor logos can be found here.

Bohemian spring

Bohemian spring
Originally uploaded by Lukáš Vlček
Photo I took in the beginning on this spring.

Sunday, February 24, 2008

We are moving to a new flat

Finally, we are moving to our new flat. After several months of demolition, reconstruction and renovation we are looking forward to a bigger flat with better view. I have also moved my paintings and easel today as well. I am sure neighbours cannot wait until they hear me practicing my trombone...

Tuesday, February 19, 2008

Hadoop - Large Yahoo! cluster

Yahoo!'s bet on Hadoop framework proves to be a good choice. They started using Hadoop in their production environment and every web search query is now using data which is processed on very large Hadoop cluster. You can find more details on Yahoo! Developer Network: Yahoo! Launches World's Largest Hadoop Production Application.

Some highlights:
  • Number of links between pages in the index: roughly 1 trillion links
  • Size of output: over 300 TB, compressed
  • Number of cores used to run a single Map-Reduce job: over 10,000
  • Raw disk used in the production cluster: over 5 Petabytes
  • Hadoop has allowed us to run the identical processing we ran pre-Hadoop on the same cluster in 66% of the time our previous system took. It does that while simplifying administration.
I think this is quite impressive. Considering that Hadoop is open source software in early stage of development written in Java could this be the real reason why Microsoft want to buy Yahoo!? :-)

Update - added few more links:
Jeremy Zawodny blog (Yahoo!)
Interview with Doug Cutting (InfoQ)

Wednesday, February 13, 2008

Is Linux too expensive for corporate desktop?

Normally I don't care about regular Windows patches and updates. It is not that I don't care about security but I enjoy the luxury of our corporate IT team installing those updates for us (most of the time this activity starts while I am most busy and need to finish some important task). Today I read that Microsoft is going to release a huge patch for Windows. Especially the security issue with WebDAV Mini-Redirector really got me...

I still wonder why Linux is not considered an alternative desktop system for our corporate ecosystem.

Saturday, February 9, 2008

Google takes Czech internet market seriously

According to this post Google is going to take Czech internet market even more seriously very soon.

Sunday, January 20, 2008

Arturo Sandoval - Free MP3 Song

Get a free MP3 song I Remember Maynard composed and performed by fabulous Arturo Sandoval.

See Arturo Sandoval and Maynard Ferguson on Wikipedia for more info.

Monday, January 7, 2008

Wikia Search Engine launched today

Wikia Search was launched today:

It is using Lucene and Nutch [1] (and other technologies as well). While Hadoop is not directly mentioned it is used by Nutch by default. It is good to hear that Apache technologies are spreading the world... Congratulations!


Other links:
Un-official wikia search blog

Sunday, January 6, 2008

GIT - Fast Version Contorl System

If you use CVS or SVN then you are stupid and ugly! At least that is what Linus Torvalds is saying in Linus Torvalds on git video. Linus started GIT when Linux kernel developers had been withdrawn the ability to use BitKeeper in 2005. It seems that since then the GIT has evolved to be one of the best SCMs around.

So far I have been using CVS and SVN but Linus definitely hit the nail on the head. GIT is not improved SVN. GIT is built on top of very different mental model. It is distributed system and there is no central repository. It was designed to handle very large set of files and complex merges (fast and efficiently) and it is really robust. If you want to understand what this means then see Linus' video (I don't want to rephrase his excellent speech).

Another very interesting video about GIT by Randal Schwarts. This one is more technically oriented - internal data structures, command examples, update protocols, built in CVS server for smooth integration... etc,etc ... (Randal is very fast speaker).

Despite GIT was designed especially for large projects (like Linux kernel) I don't see any reason why it shouldn't be used for smaller project as well - even for project with only single developer.