User:Andy Maloney/Notebook/Lab Notebook of Andy Maloney/2011/04/21/Learning how to sort in Java

From OpenWetWare
Jump to navigationJump to search

Sorting

Sorting with names

So I had a difficult time understanding how to sort things yesterday so I decided to run a few simple examples. The first code I'm going to work on is sorting names. I will use the names of people in the lab and sort them. <source lang="Java"> import java.util.*;

public class Sorting { public static void main(String[] args) {

String[] A = {"Andy", "Steve", "Anthony", "Nadia", "Pranav", "Brian"};

System.out.println("A: " + Arrays.toString(A)); Arrays.sort(A); System.out.println("A sorted: " + Arrays.toString(A)); } } </source> The output of this gives

A: [Andy, Steve, Anthony, Nadia, Pranav, Brian]

A sorted: [Andy, Anthony, Brian, Nadia, Pranav, Steve]

So as expected, the list is in alphabetical order. Now, I want to include all names with out capitalization to see what happens. <source lang="Java"> import java.util.*;

public class Sorting { public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"};

System.out.println("A: " + Arrays.toString(A); Arrays.sort(A) System.out.println("A sorted: " + Arrays.toString(A); } } </source> As expected, this does not put the names in what a human would call alphabetical order. It does put the names in ASCII order.

A: [Andy, andy, Steve, steve, Anthony, anthony, Nadia, nadia, Pranav, pranav, Brian, brian]

A sorted: [Andy, Anthony, Brian, Nadia, Pranav, Steve, andy, anthony, brian, nadia, pranav, steve]

While this is fine and dandy for a computer, it does me absolutely no good. This is one of those unfortunate circumstances that when people where first making ASCII codes for an alphabet, they didn't put them in a sortable order that a human would have done, i.e. Steve should come after andy. This is now a legacy issue because I'm sure no one will want to change the ASCII dictionary of letters such that they correspond to a human's way of sorting and I'm sure it would be a nightmare to do so. Oh well, I guess it's up to people who program to make algorithms to sort things the way you were taught in grade school.

Thankfully, there are some workarounds to the above sorting issue. I'm even more thankful that there is a simple one line workaround. To order names with capital and non capital letters, one just adds java.text.Collator.getInstance() to the Arrays.sort function. <source lang="Java"> import java.util.*; import java.text.*;

public class Sorting { public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"};

System.out.println("A: " + Arrays.toString(A); Arrays.sort(A, Collator.getInstance()); System.out.println("A sorted:" + Arrays.toString(A); } } </source> The output of this is as expected

A: [Andy, andy, Steve, steve, Anthony, anthony, Nadia, nadia, Pranav, pranav, Brian, brian]

A sorted: [andy, Andy, anthony, Anthony, brian, Brian, nadia, Nadia, pranav, Pranav, steve, Steve]

Sorting with numbers

The next step is to try and see how sorting works with numbers. I just added a list of numbers in the above code. <source lang="Java"> import java.util.*; import java.text.*;

public class Sorting { public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"}; Integer[] B = {0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795};

System.out.println("A: " + Arrays.toString(A)); Arrays.sort(A, Collator.getInstance()); System.out.println("A sorted: " + Arrays.toString(A));

System.out.println(); System.out.println("B: " + Arrays.toString(B)); Arrays.sort(B); System.out.println("B sorted: " + Arrays.toString(B));

} } </source> The output is the following.

A: [Andy, andy, Steve, steve, Anthony, anthony, Nadia, nadia, Pranav, pranav, Brian, brian]

A sorted: [andy, Andy, anthony, Anthony, brian, Brian, nadia, Nadia, pranav, Pranav, steve, Steve]


B: [0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795]

B sorted: [0, 1, 10, 16, 48, 55, 66, 400, 557, 1677, 87123, 6458795]

This is good. It would appear that Arrays.sort will sort numbers as one would expect them to be sorted even though the ASCII sorting would put 400 next to 48. The next step is to see if I can make a string of numbers and sort them, (i.e. 01 and 0001). <source lang="Java"> import java.util.*; import java.text.*;

public class Sorting { public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"}; Integer[] B = {0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795}; String[] C = {"0001", "0100", "1000", "2010", "1234", "0004", "23956", "2395"};

System.out.println("A: " + Arrays.toString(A)); Arrays.sort(A, Collator.getInstance()); System.out.println("A sorted: " + Arrays.toString(A));

System.out.println(); System.out.println("B: " + Arrays.toString(B)); Arrays.sort(B); System.out.println("B sorted: " + Arrays.toString(B));

System.out.println(); System.out.println("C: " + Arrays.toString(C)); Arrays.sort(C); System.out.println("C sorted: " + Arrays.toString(C));

} } </source> This is nice, it would appear that Arrays.sort does sort things as I would expect them to be sorted even with numbers as strings.

A: [Andy, andy, Steve, steve, Anthony, anthony, Nadia, nadia, Pranav, pranav, Brian, brian]

A sorted: [andy, Andy, anthony, Anthony, brian, Brian, nadia, Nadia, pranav, Pranav, steve, Steve]


B: [0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795]

B sorted: [0, 1, 10, 16, 48, 55, 66, 400, 557, 1677, 87123, 6458795]


C: [0001, 0100, 1000, 2010, 1234, 0004]

C sorted: [0001, 0004, 0100, 1000, 1234, 2010, 2395, 23956]

Sorting file names

Okay, it would appear that the sorting is working out just fine. The problem is that I have a bunch of file names that I want to sort. The file names look like Cam6-0001.jpg, Cam6-0002.jpg,...Cam6-42300.jpg. Let's see if Arrays.sort will work with sorting these types of strings. <source lang="Java"> import java.util.*; import java.text.*;

public class Sorting { public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"}; Integer[] B = {0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795}; String[] C = {"0001", "0100", "1000", "2010", "1234", "0004", "23956", "2395"}; String[] D = {"Cam6-0001.jpg", "Cam6-10000.jpg", "Cam6-346722.jpg", "Cam6-0042.jpg"};

System.out.println("A: " + Arrays.toString(A)); Arrays.sort(A, Collator.getInstance()); System.out.println("A sorted: " + Arrays.toString(A));

System.out.println(); System.out.println("B: " + Arrays.toString(B)); Arrays.sort(B); System.out.println("B sorted: " + Arrays.toString(B));

System.out.println(); System.out.println("C: " + Arrays.toString(C)); Arrays.sort(C); System.out.println("C sorted: " + Arrays.toString(C));

System.out.println(); System.out.println("D: " + Arrays.toString(D)); Arrays.sort(D); System.out.println("D sorted: " + Arrays.toString(D)); } } </source> Okay, the output is exactly what I want.

A: [Andy, andy, Steve, steve, Anthony, anthony, Nadia, nadia, Pranav, pranav, Brian, brian]

A sorted: [andy, Andy, anthony, Anthony, brian, Brian, nadia, Nadia, pranav, Pranav, steve, Steve]


B: [0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795]

B sorted: [0, 1, 10, 16, 48, 55, 66, 400, 557, 1677, 87123, 6458795]


C: [0001, 0100, 1000, 2010, 1234, 0004, 23956, 2395]

C sorted: [0001, 0004, 0100, 1000, 1234, 2010, 2395, 23956]


D: [Cam6-0001.jpg, Cam6-10000.jpg, Cam6-346722.jpg, Cam6-0042.jpg]

D sorted: [Cam6-0001.jpg, Cam6-0042.jpg, Cam6-10000.jpg, Cam6-346722.jpg]

But this is not what happened yesterday when I was trying to order the file names.

Reexamining yesterday's code

<source lang="Java"> import java.io.*; import java.util.*;

public class SortListFiles { public static void main(String[] args) { String path = "/home/andy/Documents/TobaccoSeeds/Cam6/"; File tempFile = new File(path); File[] listOfFileNames = tempFile.listFiles(); int arrayLength = listOfFileNames.length; String[] fileNameArray = new String[arrayLength];

for(int i = 0; i < arrayLength; i++) { if(listOfFileNames[i].isFile()) { fileNameArray[i] = listOfFileNames[i].getName(); } } Arrays.sort(fileNameArray); System.out.println(Arrays.toString(fileNameArray)); } } </source> So, I am not getting the files in lexicographical order with the above code. I have a list that goes from Cam6-1000.jpg to Cam6-10000.jpg. Changing the Arrays.sort to include the Collator.getInstance does not help either.

Okay, so from what I understand, I now need to a function that compares the list values and arranges them in order dependent on their comparisons.

Hmm, I can't seem to get it...

I also looked at the sorting code found at the top of the page and changed it some what. <source lang="Java"> import java.util.*; import java.text.*;

public class Sorting { public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"}; Integer[] B = {0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795}; String[] C = {"0001", "0100", "1000", "2010", "1234", "0004", "23956", "2395"}; String[] D = {"Cam6-0001.jpg", "Cam6-10000.jpg", "Cam6-346722.jpg", "Cam6-0042.jpg", "Cam6-1006", "Cam6-10064"};

System.out.println("A: " + Arrays.toString(A)); Arrays.sort(A, Collator.getInstance()); System.out.println("A sorted: " + Arrays.toString(A));

System.out.println(); System.out.println("B: " + Arrays.toString(B)); Arrays.sort(B); System.out.println("B sorted: " + Arrays.toString(B));

System.out.println(); System.out.println("C: " + Arrays.toString(C)); Arrays.sort(C); System.out.println("C sorted: " + Arrays.toString(C));

System.out.println(); System.out.println("D: " + Arrays.toString(D)); Arrays.sort(D); System.out.println("D sorted: " + Arrays.toString(D)); } } </source> The output shows that this sorting isn't working.

A: [Andy, andy, Steve, steve, Anthony, anthony, Nadia, nadia, Pranav, pranav, Brian, brian]

A sorted: [andy, Andy, anthony, Anthony, brian, Brian, nadia, Nadia, pranav, Pranav, steve, Steve]


B: [0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795]

B sorted: [0, 1, 10, 16, 48, 55, 66, 400, 557, 1677, 87123, 6458795]


C: [0001, 0100, 1000, 2010, 1234, 0004, 23956, 2395]

C sorted: [0001, 0004, 0100, 1000, 1234, 2010, 2395, 23956]


D: [Cam6-0001.jpg, Cam6-10000.jpg, Cam6-346722.jpg, Cam6-0042.jpg, Cam6-1006, Cam6-10064]

D sorted: [Cam6-0001.jpg, Cam6-0042.jpg, Cam6-10000.jpg, Cam6-1006, Cam6-10064, Cam6-346722.jpg]

Now I'm really confused.

Andy Maloney 16:49, 21 April 2011 (EDT): I got it. Use someone else's code to do it. <source lang="Java"> public class AlphanumComparator implements Comparator<String> { private final boolean isDigit(char ch) { return ch >= 48 && ch <= 57; } private final String getChunk(String s, int slength, int marker) { StringBuilder chunk = new StringBuilder(); char c = s.charAt(marker); chunk.append(c); marker++; if (isDigit(c)) { while (marker < slength) { c = s.charAt(marker); if (!isDigit(c)) break; chunk.append(c); marker++; } } else { while (marker < slength) { c = s.charAt(marker); if (isDigit(c)) break; chunk.append(c); marker++; } } return chunk.toString(); } public int compare(String s1, String s2) { int thisMarker = 0; int thatMarker = 0; int s1Length = s1.length(); int s2Length = s2.length();

while (thisMarker < s1Length && thatMarker < s2Length) { String thisChunk = getChunk(s1, s1Length, thisMarker); thisMarker += thisChunk.length();

String thatChunk = getChunk(s2, s2Length, thatMarker); thatMarker += thatChunk.length();

int result = 0; if (isDigit(thisChunk.charAt(0)) && isDigit(thatChunk.charAt(0))) { int thisChunkLength = thisChunk.length(); result = thisChunkLength - thatChunk.length();

if (result == 0) { for (int i = 0; i < thisChunkLength; i++) { result = thisChunk.charAt(i) - thatChunk.charAt(i); if (result != 0) { return result; } } } } else { result = thisChunk.compareTo(thatChunk); } if (result != 0) return result; } return s1Length - s2Length; } </source> Found from Dave Koelle's blog. While this does great for inputing a list in the code, I can't get it to work when it reads file names. So, once again I've hit a road block.

Well, I couldn't get Koelle's code to work. But I did find more from Tim Fennell's Blog and I implemented it below. <source lang="Java"> import java.util.Arrays; import java.util.Collections; import java.util.Comparator; import java.util.List; import java.util.regex.Pattern; import java.util.regex.Matcher;

public class Sorting implements Comparator<String>{

   Pattern SEGMENT_PATTERN = Pattern.compile("(\\d+(\\.\\d+)?|\\D+)");
   public static final Sorting DEFAULT = new Sorting();
   public int compare(String lhs, String rhs) {
       if (lhs == null && rhs == null) return 0;
       else if (lhs == null) return -1;
       else if (rhs == null) return 1;
               
       Matcher lhsMatcher = SEGMENT_PATTERN.matcher(lhs);
       Matcher rhsMatcher = SEGMENT_PATTERN.matcher(rhs);
       
       int result = 0;
       while (result == 0) {
           boolean lhsFound = lhsMatcher.find();
           boolean rhsFound = rhsMatcher.find();
           
           if (!lhsFound && !rhsFound) {
               return lhs.compareTo(rhs);
           }
           else if (!lhsFound) result = -1;
           else if (!rhsFound) result = 1;
           else {
               String lhsSegment = lhsMatcher.group();
               String rhsSegment = rhsMatcher.group();
           
               if (Character.isDigit(lhsSegment.toCharArray()[0])) {
                   result = compareNumberSegments(lhsSegment, rhsSegment);
               }
               else {
                   result = compareStringSegments(lhsSegment, rhsSegment);
               }
           }
       }
               
       return result;
   }
   protected int compareNumberSegments(String lhs, String rhs) {
       return new Double(lhs).compareTo(new Double(rhs));
   }
   protected int compareStringSegments(String lhs, String rhs) {
       return lhs.compareToIgnoreCase(rhs);
   }

public static void main(String[] args) { String[] A = {"Andy", "andy", "Steve", "steve", "Anthony", "anthony", "Nadia", "nadia", "Pranav", "pranav", "Brian", "brian"}; Integer[] B = {0, 48, 400, 55, 557, 16, 1677, 1, 10, 66, 87123, 6458795}; String[] C = {"0001", "0100", "1000", "2010", "1234", "0004", "23956", "2395"}; String[] D = {"Cam6-0001.jpg", "Cam6-10000.jpg", "Cam6-346722.jpg", "Cam6-0042.jpg", "Cam6-1006", "Cam6-10064"}; List<String> L = Arrays.asList(D);

System.out.println("A: " + Arrays.toString(A)); Arrays.sort(A, new Sorting()); System.out.println("A sorted: " + Arrays.toString(A));

System.out.println(); System.out.println("B: " + Arrays.toString(B)); Arrays.sort(B); System.out.println("B sorted: " + Arrays.toString(B));

System.out.println(); System.out.println("C: " + Arrays.toString(C)); Arrays.sort(C); System.out.println("C sorted: " + Arrays.toString(C));

System.out.println(); System.out.println("D: " + Arrays.toString(D)); Collections.sort(L, new Sorting()); System.out.println("D sorted: " + Arrays.toString(D)); } } </source> And this actually works with the simple example I use in array D. The problem is that I can't seem to get it to work with the file names. This is curious.