Wednesday, December 22, 2021

Merge Tasks Without Overlapping

December 22, 2021 java, spring-boot No comments

Issue

I have a use case where I have different sets of tasks that must be completed in a specific order based on the situation. Let's say I have three cases to run. Instead of repeating the same work for each case, I should be able to integrate the common tasks across all of them and perform them only once. The merge should take place without overlapping jobs that have already been merged. My use case and intended output are depicted in the diagram below.

Input

Expected Output

Task A, Task B, and Task C are common to all cases in this diagram. Task B, on the other hand, will not be merged because it will overlap with Task A, which is already merged. So, could someone suggest an algorithm to implement this in a Java project ?

Solution

this will be a long post, I will try to make it as easy to understand as possible. Please let me know in the comment if you want me to fix or clarify something.

Source Code : https://github.com/asinkxcoswt/stackoverflow-merge-task-lcs

Problem Analysis

According to your explanation, we have 3 sequences of non-repeating, unordered tasks. The sequences can be "merged" so that some common tasks can be executed together to reduce time and resources. But usually, there are many ways to merge. You are finding the way that gives the best in terms of

Maximum possible merge points

Maximum possible parallel execution

Least number of steps to be executed altogether

This can be viewed as an optimization problem in which you are trying to minimize the cost of resources and duration.

resources (steps): you want to have the least number of steps and this has the same meaning as the "maximum possible merge points" (because the more tasks you merge, the fewer steps you have. This can be the number of the square box in the picture when each task use resources = 1.
duration: you want the "maximum parallel execution", but I still find it a bit uncleared about how to measure. So I guess you actually want the tasks to be done as fast as possible, which means you want to minimize the total duration, which can be the number of horizontal rows in the picture when each row represents duration = 1.

You can observe that the state of the input (before merging) is where you already have the minimum duration (maximum parallel execution) but maximum resources. This means our merging algorithm can only do 2 things.

(Bad) Make the duration increase or stay the same.
(Good) Make the resources decrease or stay the same.

In other words, we are going to reduce the resources at the cost of making the tasks take a longer time to complete. We cannot determine which solution is the best until we define the final cost in terms of these 2 factors. For example, one may prefer to have as minimum resources as possible regardless of the duration, others may prefer differently.

cost: for the sake of simplicity, I will use, as an example here, the definition cost = resources * duration, which can be thought of as the money you pay for resources number of workers by the duration they work.
goal: So the goal of this optimization is to find the way to merge 3 task sequences that give the minimum cost = resources * duration

Solution Overview

The nature of this problem is very similar to the well-known Longest Common Subsequence (LCS). We can say that each possible merging result is a one-to-one correspondence to a common subsequence of the 3 task lists. But unlike the LCS problem, we cannot say the longest one is the best for us. Instead, we have to find all possible common subsequences and calculate their corresponding cost, then choose the one with the least cost.

Implemetation

Code Infrastructure

Before talking in detail about LCS, let first translate the problem into code.

Task

A task is a unit that holds the duration and resources we are going to optimize. It usually has a run() method to define what it is supposed to do. The id is here because we have to identify and equalize them. Here if 2 tasks have the same id, I will treat them as the same task.

abstract class Task {
    public abstract String id();
    public abstract void run();
    public abstract int duration();
    public abstract int resources();
    public int cost() {
        return this.duration() * this.resources();
    };

     /**
     * Just a convenient method to create a task with a one-liner
     */
    static Task of(String id, int duration, int resource, Runnable execution) {
        return new Task() {
            @Override
            public void run() {
                execution.run();
            }

            @Override
            public int duration() {
                return duration;
            }

            @Override
            public int resources() {
                return resource;
            }

            @Override
            public String id() {
                return id;
            }
        };
    }
}

This is how the tasks are prepared. Please notice that every task is created with resources=1 and duration=1, but you can change them as your desire.

    public static void main(String[] args) {
        Task A = Task.of("A",1, 1, () -> System.out.println("I'm task A"));
        Task B = Task.of("B",1, 1, () -> System.out.println("I'm task B"));
        Task C = Task.of("C",1, 1, () -> System.out.println("I'm task C"));
        Task D = Task.of("D",1, 1, () -> System.out.println("I'm task D"));
        Task E = Task.of("E",1, 1, () -> System.out.println("I'm task E"));
        Task U = Task.of("U",1, 1, () -> System.out.println("I'm task U"));
        Task W = Task.of("W",1, 1, () -> System.out.println("I'm task W"));
        Task X = Task.of("X",1, 1, () -> System.out.println("I'm task X"));
        Task Y = Task.of("Y",1, 1, () -> System.out.println("I'm task Y"));
        Task Z = Task.of("Z",1, 1, () -> System.out.println("I'm task Z"));

        Task[] case1 = new Task[] {A,B,C,D,E};
        Task[] case2 = new Task[] {W,B,A,U,C,E};
        Task[] case3 = new Task[] {B,X,Y,Z,E,A,C};
    }

ParallelTask

A ParallelTask takes arbitrary number task lists. Its run() method executes all task lists in parallel. But each task in a specific list still is executed in sequence. It also aggregates the duration and resources of its underlying tasks. The duration of a parallel task will equal the duration of the longest list. While the resources is just the simple sum of every task's resources.

class ParallelTask extends Task {
    private final String id;
    private final Task[][] taskSequences;
    ParallelTask(String id, Task[] ... taskSequences) {
        this.id = id;
        this.taskSequences = taskSequences;
    }

    @Override
    public void run() {
        Arrays.stream(this.taskSequences).parallel().forEach(taskSeq -> {
            Arrays.stream(taskSeq).forEach(Task::run);
        });
    }

    @Override
    public int duration() {
        IntStream eachSequenceTotalDuration = Arrays.stream(this.taskSequences).mapToInt(
                taskSeq -> Arrays.stream(taskSeq).mapToInt(Task::duration).sum()
        );

        return eachSequenceTotalDuration.max().orElse(0);
    }

    @Override
    public int resources() {
        IntStream eachSequenceTotalResources = Arrays.stream(this.taskSequences).mapToInt(
                taskSeq -> Arrays.stream(taskSeq).mapToInt(Task::resources).sum()
        );

        return eachSequenceTotalResources.sum();
    }

    @Override
    public String id() {
        return this.id;
    }
}

Initial Cost

You can run follow code and see that the initial cost (before merging) is 126.

public static void main(String[] args) {

    ... init tasks ...

    Task[] case1 = new Task[] {A,B,C,D,E};
    Task[] case2 = new Task[] {W,B,A,U,C,E};
    Task[] case3 = new Task[] {B,X,Y,Z,E,A,C};

    Task p = new ParallelTask("INITIAL_SOLUTION", case1, case2, case3);
    // p.run(); // <-- try this if you want
    System.out.println(p.cost()); // <-- 126
}

Full Report

For convenience, let's add a method to print the full report to see the cost, duration, resources, and how to tasks are organized.

class ParallelTask extends Task {
    ... other methods ...

    public void report(boolean showSummary) {
        if (showSummary) {
            System.out.println("------------------------");
            System.out.println("TASK " + this.id() + " (COST " + this.cost() + ", DURATION " + this.duration() + ", RESOURCES " + this.resources() + ")");
            System.out.println("------------------------");
        }

        int l = Arrays.stream(this.taskSequences).mapToInt(taskSeq -> taskSeq.length).max().orElse(0);
        for (int j = 0; j < l; j ++) {
            for (int i = 0; i < this.taskSequences.length; i++) {
                if (this.taskSequences[i].length - 1 < j) {
                    System.out.print("  ");
                    continue;
                }
                Task task = this.taskSequences[i][j];
                if (task instanceof ParallelTask p) {
                    p.report(false);
                } else if (task instanceof MergedTask m) {
                    System.out.println(task.id());
                } else {
                    System.out.print(task.id() + " ");
                }
            }
            System.out.println("");
        }
    }
}

Running p.report(true) will print something like this.

------------------------
TASK INITIAL_SOLUTION (COST 126, DURATION 7, RESOURCES 18)
------------------------
A W B 
B B X 
C A Y 
D U Z 
E C E 
  E A 
    C

MergedTask

Let's define another special task. A MergedTask wraps a normal task and reduces its's resources by 3 times (because it merges 3 boxes into 1).

class MergedTask extends Task {

    private final Task originalTask;
    MergedTask(Task originalTask) {
        this.originalTask = originalTask;
    }

    @Override
    public void run() {
        originalTask.run();
    }

    @Override
    public int duration() {
        return originalTask.duration();
    }

    @Override
    public int resources() {
        return originalTask.resources()/3;
    }

    @Override
    public String id() {
        return "MERGED(" + originalTask.id() + ")";
    }
}

Manual Solution

Let's see how the code will look like if we try to solve the problem by hand (without using any algorithm yet)


public static void main(String[] args) {
    Task p1 = new ParallelTask(
            "p1",
            new Task[] {A},
            new Task[] {W}
    );
    Task m1 = new MergedTask(B);
    Task p2 = new ParallelTask(
            "p2",
            new Task[] {C, D},
            new Task[] {A, U, C},
            new Task[] {X, Y, Z}
    );
    Task m2 = new MergedTask(E);
    Task p3 = new ParallelTask(
            "p3",
            new Task[] {A, C}
    );

    new ParallelTask("MANUAL_MERGED_TASK", new Task[] {p1, m1, p2, m2, p3}).report(true);
}

Here I manually construct the case when we merge at B and E. As you can see from the following report, its cost is 96, even though the duration = 8 is worse, the cost is clearly better than the initial cost. But is it the best? We must find all possible solutions to tell.

------------------------
TASK MANUAL_MERGED_TASK (COST 96, DURATION 8, RESOURCES 12)
------------------------
A W 

MERGED(B)

C A X 
D U Y 
  C Z 

MERGED(E)

A 
C

LCS

We will modify the LCS algorithm to find all possible common subsequences of the 3 task lists case1, case2, and case3. I will omit the full implementation of this modified LCS because it will be too much for this post, you can find the source code on my Github.

Set<List<Cell>> findAllCommonSubsequences(List<Task> case1, List<Task> case2, List<Task> case3, String axisOrder) {
    ... see that full implemation on my Github ...
}

This function will give you a Set of possible common subsequences. List<Cell> is a subsequence, sorry it is not an intuitive notation, hopefully, you can understand from its example value below.

[
  [Cell(xyz=136,ref=025,common=true,tasks=A)], 
  [Cell(xyz=221,ref=110,common=true,tasks=B), Cell(xyz=357,ref=246,common=true,tasks=C)], 
  [Cell(xyz=221,ref=110,common=true,tasks=B), Cell(xyz=565,ref=454,common=true,tasks=E)], 
  [Cell(xyz=136,ref=025,common=true,tasks=A), Cell(xyz=357,ref=246,common=true,tasks=C)], 
  [Cell(xyz=221,ref=110,common=true,tasks=B)]
]

These are all possible common subsequences of the following input task lists.

Task[] case1 = new Task[] {A,B,C,D,E};
Task[] case2 = new Task[] {W,B,A,U,C,E};
Task[] case3 = new Task[] {B,X,Y,Z,E,A,C};

Consider an example sequence [Cell(xyz=136,ref=025,common=true,tasks=A)]. This sequence has only 1 common element (task A). The number xyz=136 indicates the position of A in each input list case1, case2 and case3 respectively. The ref=025 is a part of LCS implementation details, you know what it means if you know LCS, hopefully. In summary, here all common subsequences are A, B. BC, BE, and AC.

Merge The Tasks

You may have noticed from the Manual Solution section that the final result we actually want will always be of the form

new ParallelTask("SOLUTION", new Task[] {p1, m1, p2, m2, p3, m4, p4, ...})

In other words, we are going to use a common subsequence, such as B E, to divide the 3 input task lists into n ParallelTasks (p1, p2, p3, ..., pn) and n-1 MergedTasks (m1, m2, m3, ..., m{n-1}), and then put them into a single ParallelTask. Each MergedTask corresponds to each element in the common subsequence.

Here is a method that can do so. It is not elegant but hopefully, you got the big picture.

ParallelTask merge(List<Cell> commonSubsequence, List<Task> case1, List<Task> case2, List<Task> case3) {

        // pad cell index to prevent confusion, because the original cells will have indices = taskList's index - 1 due to LCS algorithm
        List<Cell> commonSubsequencePadded = commonSubsequence.stream()
                .map(cell -> new Cell(cell.value, cell.x - 1, cell.y - 1, cell.z -1, 0, 0, 0, true, cell.taskX, cell.taskY, cell.taskZ))
                .toList();
        System.out.println(commonSubsequencePadded);

        Cell prevCell = null;
        int taskNameIndex = 1;
        StringBuilder finalResultTaskName = new StringBuilder("MERGED_TASK(");
        List<Task> merged = new ArrayList<>();
        for (Cell cell : commonSubsequencePadded) {
            int startL1 = prevCell == null ? 0 : prevCell.x + 1;
            int startL2 = prevCell == null ? 0 : prevCell.y + 1;
            int startL3 = prevCell == null ? 0 : prevCell.z + 1;
            Task[] l1 = cell.x == startL1 ? new Task[0] : case1.subList(startL1, cell.x).toArray(new Task[0]);
            Task[] l2 = cell.y == startL2 ? new Task[0] : case2.subList(startL2, cell.y).toArray(new Task[0]);
            Task[] l3 = cell.z == startL3 ? new Task[0] : case3.subList(startL3, cell.z).toArray(new Task[0]);
            Task p = new ParallelTask("p" + taskNameIndex, l1, l2, l3);
            Task m = new MergedTask(cell.taskX);
            merged.add(p);
            merged.add(m);

            prevCell = cell;
            taskNameIndex++;
            finalResultTaskName.append(cell.taskX.id());
        }

        // if there are some remaining tasks after the last merge point
        if (prevCell != null && (prevCell.x + 1 < case1.size() || prevCell.y + 1 < case2.size() || prevCell.z + 1 < case3.size())) {
            Task[] l1 = prevCell.x + 1 < case1.size() ? case1.subList(prevCell.x + 1, case1.size()).toArray(new Task[0]) : new Task[0];
            Task[] l2 = prevCell.y + 1 < case2.size() ? case2.subList(prevCell.y + 1, case2.size()).toArray(new Task[0]) : new Task[0];
            Task[] l3 = prevCell.z + 1 < case3.size() ? case3.subList(prevCell.z + 1, case3.size()).toArray(new Task[0]) : new Task[0];
            Task p = new ParallelTask("p" + taskNameIndex, l1, l2, l3);
            merged.add(p);
        }

        return new ParallelTask(finalResultTaskName.append(")").toString(), merged.toArray(new Task[0]));
    }

Let's See The Result

Wiring them all together.

Set<List<Cell>> allCommonSubsequenceList = findAllCommonSubsequences(List.of(case1), List.of(case2), List.of(case3), "XYZ");
        List<ParallelTask> allPossibleMergeList = allCommonSubsequenceList.stream().map(commonSubsequence -> merge(commonSubsequence, List.of(case1), List.of(case2), List.of(case3))).toList();

        allPossibleMergeList.forEach(t -> t.report(true));

Here are the results for the 5 possible common subsequences A, B, BC, BE, and AC. And the most optimal solution is BE at cost 96.

------------------------
TASK MERGED_TASK(A) (COST 150, DURATION 10, RESOURCES 15)
------------------------
  W B 
  B X 
    Y 
    Z 
    E 

MERGED(A)

B U C 
C C   
D E   
E     

------------------------
TASK MERGED_TASK(BC) (COST 120, DURATION 10, RESOURCES 12)
------------------------
A W   

MERGED(B)

  A X 
  U Y 
    Z 
    E 
    A 

MERGED(C)

D E   
E     

------------------------
TASK MERGED_TASK(BE) (COST 96, DURATION 8, RESOURCES 12)
------------------------
A W   

MERGED(B)

C A X 
D U Y 
  C Z 

MERGED(E)

    A 
    C 

------------------------
TASK MERGED_TASK(AC) (COST 120, DURATION 10, RESOURCES 12)
------------------------
  W B 
  B X 
    Y 
    Z 
    E 

MERGED(A)

B U   

MERGED(C)

D E   
E     

------------------------
TASK MERGED_TASK(B) (COST 120, DURATION 8, RESOURCES 15)
------------------------
A W   

MERGED(B)

C A X 
D U Y 
E C Z 
  E E 
    A 
    C

Improvement

This section is about the possibility to extend to solution to support N number of lists instead of just 3 lists.

In the above implementation, I modify LCS algorithm, which originally support 2 lists, to support the case of 3 lists by adding another for-loop to the code. So it will be another level difficult to write code that can handle N list.

But I think maybe we can use another way. We can first find all common subsequences of the first two lists. And then combine the result with the rest N-2 lists and find their common subsequences recursively.

I have added an example code for this idea in the /improved folder of the original code on my Github.

I created a class CommonSubsequenceFinder, like this

public class CommonSubsequenceFinder {
    public <T> Set<List<T>> findAllIn2Lists(List<T> l1, List<T> l2) {
        ...
    }

    public <T> Set<List<T>> findAllInNList(List<T> ... listArray) {
        if (listArray.length <= 1) {
            return new HashSet<>();
        }

        List<List<T>> lists = Arrays.asList(listArray);

        List<T> firstList = lists.get(0);
        List<T> secondList = lists.get(1);

        Set<List<T>> commonSubsequences = findAllIn2Lists(firstList, secondList);

        if (commonSubsequences.isEmpty()) {
            return commonSubsequences;
        }

        if (lists.size() == 2) {
            return commonSubsequences;
        }

        System.out.println(commonSubsequences);

        Set<List<T>> commonSubsequencesEx = new HashSet<>();
        List<List<T>> theRest = lists.subList(2, lists.size());
        for (List<T> subSeq : commonSubsequences) {
            List<List<T>> l = new ArrayList<>();
            l.add(subSeq);
            l.addAll(theRest);
            commonSubsequencesEx.addAll(findAllInNList(l.toArray(new List[l.size()])));
        }

        return commonSubsequencesEx;
    }
}

In the class, you will find two public methods findAllIn2Lists(...) and findAllInNList(...).

The findAllIn2Lists(...) can find all common subsequences of 2 lists. I make it to support a generic type, you can put the list of anything not limited to just a Task, but with a note that the class must implement method equals(...) and hashCode(...) properly. For example, I have added the methods in Task like below, which then allow us to compare the equality of 2 tasks such as t1.equals(t2) and it will return true if t1 and t2 have the same id.

abstract class Task {
    public abstract String id();
    public abstract void run();
    public abstract int duration();
    public abstract int resources();
    public int cost() {
        return this.duration() * this.resources();
    };
    
    @Override
    public String toString() {
        return this.id();
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null) return false;
        if (o instanceof Task t) {
            return id().equals(t.id());
        }
        return false;
    }

    @Override
    public int hashCode() {
        return Objects.hash(id());
    }
}

Another method in CommonSubsequenceFinder is findAllInNList(...). This is the aforementioned recursive method. You can put following code in the main method to see its result.

Set<List<Task>> allCommonSubsequences = new CommonSubsequenceFinder().findAllInNList(List.of(case1), List.of(case2), List.of(case3));
System.out.println(allCommonSubsequences);

Result:

[[B], [B, C], [A, C], [E], [B, E], [A]]

The only problem left is how to use a common subsequence to merge N list of task. It will be basically the same as in the 3 lists case. You have to loop for each element in the subsequence, use the element to divide the lists into the form p1, m1, p2, m2, ... and put them in a ParallelTask. Please let me know if you want further help about this.

And you may want to edit the calculation in MergedTask.resources(...) to divide by N instead of 3.

Answered By - asinkxcoswt

This Answer collected from stackoverflow and tested by AndroidBugFix community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0