Thursday, July 30, 2009

Kindle 2 - Fast Introduction

You want to know more about Kindle 2? OK, check this 6 minutes introduction video: or check the shortened version (just 50 seconds!!):
(I like the later one).

Friday, July 24, 2009

How to locate file inside JAR?

If your file is inside a JAR file then you can tell Log4j library how to find it:

java -Dlog4j.configuration=jar:file:[path_to_jar_file]!/[path_to_properties_file_inside_jar] -jar myapp.jar


java -Dlog4j.configuration=jar:file:/home/application/lib/logic.jar!/ -jar myapp.jar

Wednesday, July 22, 2009

HadoopDB released

An interesting new project was announced: HadoopDB. Check this blog post for more details or visit project home page:

In short the project is described as:
  1. A hybrid of DBMS and MapReduce technologies that targets analytical workloads.
  2. Designed to run on a shared-nothing cluster of commodity machines, or in the cloud.
  3. An attempt to fill the gap in the market for a free and open source parallel DBMS.
  4. Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems.
  5. As scalable as Hadoop, while achieving superior performance on structured data analysis workloads.
Found via Ivan de Prado's delicious feed.

Monday, July 20, 2009

Testing opened socket in Vista, weird issue!

How do you test that there is some service listening on given port in Java? How do you learn that given port is free and you can start your own service there?

It seems that the Java Socket class can be used to find the answers to such questions. The most simple example would be to create a new instance using new Socket("localhost",port_number). If this socket can be instantiated and connected to given port then it is used by existing service, if not, you get an exception and you know that it is probably free. Not only similar approach is recommended in many articles on the web but it is also used in apache commons net (see here).

Now, I realized that this approach does not work on my Windows Vista Business SP1 (32-bit). It seems that under my Vista the Java thinks that any port is already taken and connection can be opened... wow!

The following is a simple java code which fails on Ubuntu and Windows 98, but passes on my Windows Vista (it should fail!) running under Java 1.5:

package somePackage;


public class SocketWorksOnVista {

public static void main(String[] args) {

Socket client = null;
ServerSocket server = null;
try {

// first let's test that we can open a socket at given port
server = new ServerSocket(4444);
System.out.println("server bound? " + server.isBound());
// close it for now
System.out.println("server closed? " + server.isClosed());
// now, there is nothing running on port 4444

client = new Socket("localhost",4444);

} catch (UnknownHostException e) {
} catch (IOException e) {
} finally {

if (client != null) {
System.out.println("Is connected? " + client.isConnected());
try {
System.out.println("connection closed");
} catch (IOException e) {
} else {
System.out.println("connection not established!");
Does anybody know what is wrong with this code? Why the following line: client = new Socket("localhost",4444) does not fail in Vista? Is there a bug in Java?

Note: I did test port availability using netstat -an | grep 4444 (using Cygwin) prior and after the test and I am sure there is no service running on my computer on 4444. If there were already running any service then the code would fail at the very beggining when trying to create a new ServerSocket instance. Also I did test different port numbers as well.

Tuesday, July 14, 2009

Scaling Hadoop, Webinar by Sun

Sun engineers released extended version of their Hadoop Summit '09 presentation:

Hadoop is typically scaled on a large pool of commodity system nodes. However, by using multicore, multithreaded processors, you can achieve the same scale with fewer machines. In this Webinar, we will discuss how Sun's chip multithreading (CMT) technology-based UltraSPARC T2 Plus processor can process up to 256 tasks in parallel within a single node.

We will also share with you how we evaluated CPU and I/O throughput, memory size, and task counts to extract maximal parallelism per single node, as well as an evaluation of the performance.

You cen download webinar talk here (sign in required - is free).

Saturday, July 4, 2009