- Install Ubuntu
- Use the boot disk to load a live session (Try Ubuntu)
- Mount the EFI partition
sudo mount /dev/sda1 /mnt cd /mnt/EFI
- Create a Microsoft boot directory
sudo mkdir -p Microsoft/Boot
- Copy the Ubuntu boot files into the Microsoft boot directory
sudo cp -r ubuntu/* Microsoft/Boot/
- Create the boot file Microsoft wants (assumes not using Secure Boot)
sudo cp Microfsoft/Boot/grubx64.efi Microsoft/Boot/bootmgfw.efi
- Unmount partition
cd sudo umount /mnt
I'm not that small
A blog by an overly energetic software developer, who has a knack for talking too much.
Sunday 2 October 2016
Tricking Ubuntu to boot on a Sony Vaio Pro 13
This is just a reminder to myself for the next time I install / update Ubuntu on my laptop that I need to do the following after installation, otherwise Ubuntu won't boot. Assumes that Secure Boot is disabled and using UEFI boot.
Friday 3 June 2016
Don't just assume arrays will be faster: how we implemented object pooling
Update (29/06/2016): As Mike Barker kindly pointed out, the array backed object pool was not pre-populating the array with objects so the benchmark was going through some extra logic. I've since updated the results and graphs with the array pre-populated, which hasn't greatly changed the results.
To ensure customers receive the best service possible at TransFICC we need to make sure that our systems have robust latency and throughput figures. A key part of avoiding latency spikes (in Java anyway) is to attempt to keep the Garbage Collector at bay. One way we attempt to do this is with the use of object pooling pattern to allow us to re-use objects.
We weren't sure what data structure would be best to back our object pool implementation so we decided to write a few benchmarks that utilise various data structures: stack, dequeue based stack, queue, and an array. The goal of these benchmarks was to see if any of the data structures allocate (e.g. if using a LinkedList, a node would be created everytime you added an object to the pool), and to see what the latency and throughput values would be.
The benchmark results for measuring throughput show that both of the deque backed object pools perform at a similar level for both of the scenarios we are testing (removing and adding a single object, removing and adding several objects), with our stack implementation having similar throughput for the removing and adding several objects scenario, taking into account the error.
What is more interesting is the graphing of the latency figures.
Both object pools that are backed by a deque have very similar latency profiles, which I was surprised at as I expected the the stack implementation to outperform on the remove one add one strategy as it would only be using a single object (the queue implementation will always be taking a different object from the front of the queue). The stack object pool does perform slightly better in the remove several add several scenario at one data point, but the difference isn't significant compared to the queue implementation.
What I found most surprising is the high variance in the array backed object pool. I assumed using a simple array would yield the best throughput and latency numbers, however this is not the case. I'm not sure if it's because of my implementation of the array object pool so will try another version if I have time.
In the end we opted to use an object pool using a stack backed by a deque based on our findings. I should point out that all our object pools are accessed from a single thread so are not required to be thread safe.
The benchmarks were run on a Fedora 23 box with 6 core Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz and 32GB RAM.
To ensure customers receive the best service possible at TransFICC we need to make sure that our systems have robust latency and throughput figures. A key part of avoiding latency spikes (in Java anyway) is to attempt to keep the Garbage Collector at bay. One way we attempt to do this is with the use of object pooling pattern to allow us to re-use objects.
We weren't sure what data structure would be best to back our object pool implementation so we decided to write a few benchmarks that utilise various data structures: stack, dequeue based stack, queue, and an array. The goal of these benchmarks was to see if any of the data structures allocate (e.g. if using a LinkedList, a node would be created everytime you added an object to the pool), and to see what the latency and throughput values would be.
The benchmark results for measuring throughput show that both of the deque backed object pools perform at a similar level for both of the scenarios we are testing (removing and adding a single object, removing and adding several objects), with our stack implementation having similar throughput for the removing and adding several objects scenario, taking into account the error.
What is more interesting is the graphing of the latency figures.
Both object pools that are backed by a deque have very similar latency profiles, which I was surprised at as I expected the the stack implementation to outperform on the remove one add one strategy as it would only be using a single object (the queue implementation will always be taking a different object from the front of the queue). The stack object pool does perform slightly better in the remove several add several scenario at one data point, but the difference isn't significant compared to the queue implementation.
What I found most surprising is the high variance in the array backed object pool. I assumed using a simple array would yield the best throughput and latency numbers, however this is not the case. I'm not sure if it's because of my implementation of the array object pool so will try another version if I have time.
In the end we opted to use an object pool using a stack backed by a deque based on our findings. I should point out that all our object pools are accessed from a single thread so are not required to be thread safe.
The benchmarks were run on a Fedora 23 box with 6 core Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz and 32GB RAM.
Benchmark code can be found here.
Throughput
Benchmark Mode Cnt Score Error Units StackObjectPoolUsingDequeBenchmark.removeAndAddOne thrpt 10 96.785 ± 0.076 ops/us StackObjectPoolUsingDequeBenchmark.removeAndAddOne:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec StackObjectPoolUsingDequeBenchmark.removeAndAddOne:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁵ B/op StackObjectPoolUsingDequeBenchmark.removeAndAddOne:·gc.count thrpt 10 ≈ 0 counts QueueObjectPoolUsingDequeBenchmark.removeAndAddOne thrpt 10 96.558 ± 0.614 ops/us QueueObjectPoolUsingDequeBenchmark.removeAndAddOne:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec QueueObjectPoolUsingDequeBenchmark.removeAndAddOne:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁵ B/op QueueObjectPoolUsingDequeBenchmark.removeAndAddOne:·gc.count thrpt 10 ≈ 0 counts StackObjectPoolUsingArrayBenchmark.removeAndAddOne thrpt 10 88.930 ± 0.065 ops/us StackObjectPoolUsingArrayBenchmark.removeAndAddOne:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec StackObjectPoolUsingArrayBenchmark.removeAndAddOne:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁵ B/op StackObjectPoolUsingArrayBenchmark.removeAndAddOne:·gc.count thrpt 10 ≈ 0 counts StackObjectPoolBenchmark.removeAndAddOne thrpt 10 16.276 ± 0.004 ops/us StackObjectPoolBenchmark.removeAndAddOne:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec StackObjectPoolBenchmark.removeAndAddOne:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁴ B/op StackObjectPoolBenchmark.removeAndAddOne:·gc.count thrpt 10 ≈ 0 counts StackObjectPoolUsingDequeBenchmark.removeAndAddSeveral thrpt 10 38.524 ± 0.012 ops/us StackObjectPoolUsingDequeBenchmark.removeAndAddSeveral:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec StackObjectPoolUsingDequeBenchmark.removeAndAddSeveral:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁴ B/op StackObjectPoolUsingDequeBenchmark.removeAndAddSeveral:·gc.count thrpt 10 ≈ 0 counts QueueObjectPoolUsingDequeBenchmark.removeAndAddSeveral thrpt 10 38.108 ± 0.255 ops/us QueueObjectPoolUsingDequeBenchmark.removeAndAddSeveral:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec QueueObjectPoolUsingDequeBenchmark.removeAndAddSeveral:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁴ B/op QueueObjectPoolUsingDequeBenchmark.removeAndAddSeveral:·gc.count thrpt 10 ≈ 0 counts StackObjectPoolUsingArrayBenchmark.removeAndAddSeveral thrpt 10 39.241 ± 1.179 ops/us StackObjectPoolUsingArrayBenchmark.removeAndAddSeveral:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec StackObjectPoolUsingArrayBenchmark.removeAndAddSeveral:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻⁴ B/op StackObjectPoolUsingArrayBenchmark.removeAndAddSeveral:·gc.count thrpt 10 ≈ 0 counts StackObjectPoolBenchmark.removeAndAddSeveral thrpt 10 5.479 ± 0.036 ops/us StackObjectPoolBenchmark.removeAndAddSeveral:·gc.alloc.rate thrpt 10 0.002 ± 0.006 MB/sec StackObjectPoolBenchmark.removeAndAddSeveral:·gc.alloc.rate.norm thrpt 10 ≈ 10⁻³ B/op StackObjectPoolBenchmark.removeAndAddSeveral:·gc.count thrpt 10 ≈ 0 counts
Latency
Remove One Add One Scenario
Percentile StackObjectPoolUsingDequeBenchmark QueueObjectPoolUsingDequeBenchmark StackObjectPoolUsingArrayBenchmark StackObjectPoolBenchmark 0 0.558 0.558 0.558 0.558 50 0.559 0.559 0.559 0.629 90 0.629 0.629 0.629 0.629 95 0.629 0.629 0.629 0.698 99 0.629 0.629 0.629 0.699 99.9 1.118 1.118 1.118 1.118 99.99 10.88 6.842 10.752 10.964 99.999 11.872 11.792 11.872 12.057 99.9999 17.991 16.591 51.3 43.027 100 18.144 17.92 58.56 44.416 Mean 0.588 0.59 0.586 0.628 Error (us/op) 0.001 0.001 0.001 0.001
Remove Several Add Several Scenario
Percentile StackObjectPoolUsingDequeBenchmark QueueObjectPoolUsingDequeBenchmark StackObjectPoolUsingArrayBenchmark StackObjectPoolBenchmark 0 0.558 0.558 0.558 0.698 50 0.628 0.628 0.628 0.768 90 0.629 0.629 0.629 0.769 95 0.629 0.629 0.629 0.769 99 0.629 0.629 0.629 0.769 99.9 1.118 1.118 1.118 1.258 99.99 10.816 6.6 10.816 11.232 99.999 12.006 11.872 11.99 12.312 99.9999 14.389 19.716 29.362 67.966 100 14.512 22.4 35.712 69.376 Mean 0.603 0.605 0.604 0.746 Error (us/op) 0.001 0.001 0.001 0.001
Friday 15 April 2016
So.....Gradle
My previous post mentioned that we are currently using Maven as our build system but looking at moving across to Buck. Well, there has been a slight change of plan and we've moved to using Gradle. The main reason we opted for Gradle over Buck is its integration with Intellij, which was some what of a pain point when I was working at LMAX.
Migrating from Gradle to Maven was super easy, and has allowed us to simplify the various tests we had defined; Maven becomes very complicated if you try and do anything beyond their defined build tasks, which was very annoying for us as we have multiple test definitions being unit tests and integration tests (dao, acceptance, benchmark, perf, compatibility.....). This had resulted in some serious hackery (note: completely my fault).
As an example, this is how we ran DAO tests in the Maven world:
And this is how we run them with Gradle:
Wrapper script:
build.gradle:
Nice and simple!
Migrating from Gradle to Maven was super easy, and has allowed us to simplify the various tests we had defined; Maven becomes very complicated if you try and do anything beyond their defined build tasks, which was very annoying for us as we have multiple test definitions being unit tests and integration tests (dao, acceptance, benchmark, perf, compatibility.....). This had resulted in some serious hackery (note: completely my fault).
As an example, this is how we ran DAO tests in the Maven world:
And this is how we run them with Gradle:
Wrapper script:
build.gradle:
Nice and simple!
Thursday 14 April 2016
Keep it Simple - Using a bootstrap script to wrap your build system
Do you know what's annoying when you start at a new company (or do a clean checkout of some code)? Getting your development environment set up; especially when it comes to making sure you have the right software in place to be able to build said code!
I'm sure there are lots of ways to solve these problems, but a simple solution we use at TransFICC is to use a bootstrap script that wraps our build system. This is called everytime you want to run something via the build system (in our case Maven [we haven't buck'd up]), and ensures all build dependencies are present. I should note that we only use this for build dependencies, not run time dependencies.
bootstrap.sh
Our build dependencies are Java, Node (well, npm), Buck, and Maven. This script assumes the dependencies live at http://thearts.internal.transficc.com/artifacts/ (e.g. the jdk is located at http://thearts.internal.transficc.com/artifacts/jdk/jdk1.8.0_74.tar.gz and the directory contained inside the tar.gz is jdk1.8.0_74). The checksums check for a change of dependencies rather than ensuring they are downloaded correctly.
This script could be used as follows.
transficc.sh
I'm sure there are lots of ways to solve these problems, but a simple solution we use at TransFICC is to use a bootstrap script that wraps our build system. This is called everytime you want to run something via the build system (in our case Maven [we haven't buck'd up]), and ensures all build dependencies are present. I should note that we only use this for build dependencies, not run time dependencies.
bootstrap.sh
Our build dependencies are Java, Node (well, npm), Buck, and Maven. This script assumes the dependencies live at http://thearts.internal.transficc.com/artifacts/ (e.g. the jdk is located at http://thearts.internal.transficc.com/artifacts/jdk/jdk1.8.0_74.tar.gz and the directory contained inside the tar.gz is jdk1.8.0_74). The checksums check for a change of dependencies rather than ensuring they are downloaded correctly.
This script could be used as follows.
transficc.sh
./transficc.sh compile
Tuesday 21 January 2014
Why I use integration/acceptance tests
I recently posted about my annoyances of setting up integration tests in a separate directory structure than normal unit tests when using the Maven dependency management system. I briefly posted how I resolved this problem, but did not mention why I was writing integration tests.
I am relatively new to Javascript development, having done some web development in PHP many moons ago (professionally I am a server-side developer, working in a team that utilises TDD, continuous integration, etc). I was finding that as I developed the client side functionality, I was constantly breaking existing functionality as I couldn't be bothered going through and clicking on every interactive part of every webpage to ensure that everything still worked. This was becoming a real annoyance, with refactoring of the client side code becoming more risky as I didn't have any tests to ensure that functionality remained unchanged.
Changes to my APIs defined by my Spring MVC Controllers were breaking the interaction between client and server without me even realising. I knew that the functionality on the client and server sides worked independently, but I was not testing their interaction. The same could be said of my database calls. I knew my individual components of the system were working as expected, but their interactions was not!
All of this was enough for me to start writing integration (acceptance tests). To do this I used selenium for testing the interaction on the actual website (UI tests), and wrote an API so that I could interact with the server side. The API part of the testing actually allowed me to do two things:
By writing a testing language that wrapped my API and Selenium UI calls, I was then able to define my business requirements in my acceptance tests. I find this a fair better way of defining behaviour than comments as these can often become out of date without developers realising.
I am relatively new to Javascript development, having done some web development in PHP many moons ago (professionally I am a server-side developer, working in a team that utilises TDD, continuous integration, etc). I was finding that as I developed the client side functionality, I was constantly breaking existing functionality as I couldn't be bothered going through and clicking on every interactive part of every webpage to ensure that everything still worked. This was becoming a real annoyance, with refactoring of the client side code becoming more risky as I didn't have any tests to ensure that functionality remained unchanged.
Changes to my APIs defined by my Spring MVC Controllers were breaking the interaction between client and server without me even realising. I knew that the functionality on the client and server sides worked independently, but I was not testing their interaction. The same could be said of my database calls. I knew my individual components of the system were working as expected, but their interactions was not!
All of this was enough for me to start writing integration (acceptance tests). To do this I used selenium for testing the interaction on the actual website (UI tests), and wrote an API so that I could interact with the server side. The API part of the testing actually allowed me to do two things:
- I was able to ensure that pure API calls worked correctly (with verification and validation not just happening via the web page as well!);
- I could set up the website environment using API calls when testing different parts of the UI using selenium (for every test I had to register a new user - I didn't want to test the user registration page in every test as this will make all my tests slower and adds more scope than is required to each test).
By writing a testing language that wrapped my API and Selenium UI calls, I was then able to define my business requirements in my acceptance tests. I find this a fair better way of defining behaviour than comments as these can often become out of date without developers realising.
Integration tests in a separate folder structure from unit tests in Maven
I have recently been working on introducing integration/acceptance tests into a stand alone web application. It was around this time that I really started to regret using Maven as my dependency management tool. It had been fine until the point where I wanted to introduce integration tests (note: the concept of acceptance tests does not exist in Maven) into different folder structure than normal unit tests.
Maven instantly became very angry at me for wanting to do this, and insisted that if I wanted to do such a thing then I would want to do it in a separate module as they shouldn't be written specific to one module. This makes sense if you have a multi-module project as these tests should be about end-to-end testing. However, many projects are not going to be multi-module, and creating a separate module just for integration testing feels like I am over-engineering. What I wanted to do is simple against the Maven model.
After some serious faffing (and adding several new dependencies) I was able to get a solution where I my tests run in the "integration" phase of Maven, allowing for starting and stopping of Jetty in the pre-integration and post-integration phases.
pom.xml
This allows me to have the following project structure.
/src/test/java/
/src/integration/java/
/src/main/java/
I really wanted to avoid using a separate plugin for integration tests but could not get any other solution to work. I hope this helps.
Maven instantly became very angry at me for wanting to do this, and insisted that if I wanted to do such a thing then I would want to do it in a separate module as they shouldn't be written specific to one module. This makes sense if you have a multi-module project as these tests should be about end-to-end testing. However, many projects are not going to be multi-module, and creating a separate module just for integration testing feels like I am over-engineering. What I wanted to do is simple against the Maven model.
After some serious faffing (and adding several new dependencies) I was able to get a solution where I my tests run in the "integration" phase of Maven, allowing for starting and stopping of Jetty in the pre-integration and post-integration phases.
pom.xml
This allows me to have the following project structure.
/src/test/java/
/src/integration/java/
/src/main/java/
I really wanted to avoid using a separate plugin for integration tests but could not get any other solution to work. I hope this helps.
Tuesday 17 September 2013
Multiple term search algorithm (naive)
The other week I was working on a search algorithm for a website I have been developing, which utilises Java 7, Spring MVC 3, Jetty, Bootstrap, jquery et al. The algorithm has to search certain database tables and return a list of results in a semi-useful order.
What is semi-useful to the customer? For me, just being returned a list of results in alphabetical order isn't what I would deem as "semi-useful". There is also the separate issue of how to deal with the search terms: take the entire contents of the search box as an entire search term, split on spaces/punctuation (e.g. commas), try and form multi-word search terms that would make sense etc. etc.
For the search term issue, I actually cheated slightly. Due to the nature of the data for the website I was able to cheat, and state that search terms should be comma separated. I know this isn't necessarily the best solution, but provides slightly more flexibility than "everything in the search box is one search term".
Now the search term conundrum is out of the way, how should I go about creating a suitable search algorithm? Initially I took the approach of just producing something naive that iterates over the given search terms, and goes poking the database for results.
Now this was a little bit too naive as duplicate objects were present when a record was returned for multiple terms. Changing the LinkedList to a Set resolved this (overriding both hashCode and equals to ensure each object of ResultData is unique, and the hashcode is deterministic).
Now for the more interesting part: returning the results in a semi-useful order. Providing each record returned a weighting, based on how many search terms returned it as a result seemed pretty suitable (higher the weighting, the more likely the record is the right one for the customer).
Simply use Map<ResultData, Integer>, incrementing the integer value related to a ResultData key if the key already exists, or putting a new instance of ResultData into the map with a integer value of one.
There are a few things to note in the above implementation. First, results has been changed from a Collection to a Map (as Maps aren't part of Java collections). Secondly, the introduction of the nested for-loop that iterates over the databases results, checking if each ResultData record has been returned previously (results.get(record) will return null if the record has not been seen before), and adding the record to the map with an updated weighting. And finally, the search method now returns null. Well that doesn't sound right. A sorted map (using a HashMap is not suitable here as the ordering of it's key-pair values are non-deterministic) is sorted by key, whilst we want ordering by descending order of value.
The next step here would be create a new another map, which contains a bespoke comparator for ResultData that takes in the initial map of results. This map can be used to get the value for each key, and provide some order to the second map. This sounds quite horrible, and I really want to go have some tea, so I am vetoing this solution, and instead going hunting for some other data types that may be useful to me.
After some quick searching, I decided that using the Google Collections (now known as Guava) would be the way forward (and fun to try something new). Reviewing the API resulted in me discovering the TreeMultiMap, and rubbing my hands together rather happily.
Time to create a comparator it seems for ResultData, and start playing with TreeMultiMap.
The RecordDataComparator is simply ordering RecordData in alphabetical order by comparing title's (and if they're equal, unique id's).
The new search algorithm is more interesting. This is iterating over each record returned by the database, and checking if the record already exists in the value side of the map (note that it is now MultiMap<Integer, ResultData> and not Map<ResultData, Integer>). If the record has already been added to the map then we need to find the key by iterating over the key set, identifying it's key (i.e. it's weighting), and adding the record to key with a higher value. If the record does not already exist in the map then we add it with a weighting of 1. Every key-pair element in the map is ordered by the key (descending order), and then by the ResultDataComparator. By calling sortedResults.values(), you receive the search results in descending weighted order (and subsequently in alphabetical order).
How MultiMap works is that when you add a value to a key that is already present within the map, it chains that value onto the end of the values that already exist for that key. This allowed me to creating weighting "buckets", where records are added put into buckets based on how frequent they are returned by the database.
I haven't performance tested this solution yet, but I suspect it will suffer the same pitfalls as many queueing strategies, where the majority of the elements will sit in the keys 1 and 2, making the checks on the value sets of these keys rather expensive.
This is only an initial implementation of a searching algorithm that provides results in a semi-useful order, and can no doubt use some more work!!
What is semi-useful to the customer? For me, just being returned a list of results in alphabetical order isn't what I would deem as "semi-useful". There is also the separate issue of how to deal with the search terms: take the entire contents of the search box as an entire search term, split on spaces/punctuation (e.g. commas), try and form multi-word search terms that would make sense etc. etc.
For the search term issue, I actually cheated slightly. Due to the nature of the data for the website I was able to cheat, and state that search terms should be comma separated. I know this isn't necessarily the best solution, but provides slightly more flexibility than "everything in the search box is one search term".
Now the search term conundrum is out of the way, how should I go about creating a suitable search algorithm? Initially I took the approach of just producing something naive that iterates over the given search terms, and goes poking the database for results.
Now this was a little bit too naive as duplicate objects were present when a record was returned for multiple terms. Changing the LinkedList to a Set resolved this (overriding both hashCode and equals to ensure each object of ResultData is unique, and the hashcode is deterministic).
Now for the more interesting part: returning the results in a semi-useful order. Providing each record returned a weighting, based on how many search terms returned it as a result seemed pretty suitable (higher the weighting, the more likely the record is the right one for the customer).
Simply use Map<ResultData, Integer>, incrementing the integer value related to a ResultData key if the key already exists, or putting a new instance of ResultData into the map with a integer value of one.
There are a few things to note in the above implementation. First, results has been changed from a Collection to a Map (as Maps aren't part of Java collections). Secondly, the introduction of the nested for-loop that iterates over the databases results, checking if each ResultData record has been returned previously (results.get(record) will return null if the record has not been seen before), and adding the record to the map with an updated weighting. And finally, the search method now returns null. Well that doesn't sound right. A sorted map (using a HashMap is not suitable here as the ordering of it's key-pair values are non-deterministic) is sorted by key, whilst we want ordering by descending order of value.
The next step here would be create a new another map, which contains a bespoke comparator for ResultData that takes in the initial map of results. This map can be used to get the value for each key, and provide some order to the second map. This sounds quite horrible, and I really want to go have some tea, so I am vetoing this solution, and instead going hunting for some other data types that may be useful to me.
After some quick searching, I decided that using the Google Collections (now known as Guava) would be the way forward (and fun to try something new). Reviewing the API resulted in me discovering the TreeMultiMap, and rubbing my hands together rather happily.
Implementation of Multimap whose keys and values are ordered by their natural ordering or by supplied comparators. In all cases, this implementation uses Comparable.compareTo(T) or Comparator.compare(T, T) instead of Object.equals(java.lang.Object) to determine equivalence of instances.
Time to create a comparator it seems for ResultData, and start playing with TreeMultiMap.
The RecordDataComparator is simply ordering RecordData in alphabetical order by comparing title's (and if they're equal, unique id's).
The new search algorithm is more interesting. This is iterating over each record returned by the database, and checking if the record already exists in the value side of the map (note that it is now MultiMap<Integer, ResultData> and not Map<ResultData, Integer>). If the record has already been added to the map then we need to find the key by iterating over the key set, identifying it's key (i.e. it's weighting), and adding the record to key with a higher value. If the record does not already exist in the map then we add it with a weighting of 1. Every key-pair element in the map is ordered by the key (descending order), and then by the ResultDataComparator. By calling sortedResults.values(), you receive the search results in descending weighted order (and subsequently in alphabetical order).
How MultiMap works is that when you add a value to a key that is already present within the map, it chains that value onto the end of the values that already exist for that key. This allowed me to creating weighting "buckets", where records are added put into buckets based on how frequent they are returned by the database.
I haven't performance tested this solution yet, but I suspect it will suffer the same pitfalls as many queueing strategies, where the majority of the elements will sit in the keys 1 and 2, making the checks on the value sets of these keys rather expensive.
This is only an initial implementation of a searching algorithm that provides results in a semi-useful order, and can no doubt use some more work!!
Subscribe to:
Posts (Atom)