TrojanMap is class-project as part of EE538: Computing Principles for EE. This project requires students to apply graph theory concepts and implement various features using classic algorithms and data structures. Example of features are
- edit distance to find the distance between two text strings, a common function employed in spelling correction, string matching etc..
- finding shortest path between source and destination using dijkstra and bellman-ford (thereby allowing its implementation in graphs with -ve distances)
- topological sort with applications like dependency resolution
- classic travelling salesman that is implemented using various approaches like brute-force, bruceforce with backtracking and local search (2-opt)
The project involves showcasing the results on a map using OpenCV integration
Each point on the map is represented by the class Node shown below and defined in trojanmap.h.
// A Node is the location of one point in the map.
class Node {
public:
Node(){};
Node(const Node &n) {
id = n.id;
lat = n.lat;
lon = n.lon;
name = n.name;
neighbors = n.neighbors;
attributes = n.attributes;
};
std::string id; // A unique id assigned to each point.
double lat; // Latitude
double lon; // Longitude
std::string name; // Name of the location. E.g. "Bank of America".
std::vector<std::string>
neighbors; // List of the ids of all neighbor points.
std::unordered_set<std::string>
attributes; // List of the attributes of the location.
};
-
The nodes are given in the form of std::string. Instead of working with std::string as nodes, thereby, restricting the use of std::vector as a container (for which index are numerals,) for later half of functions, I mapped string IDs to int IDs and vice-versa and worked with ints and indexed into unordered_map<std::string, Node> data structure
-
Furthermore, other than travelling salesman, almost in every other function iterative form is used/preferred over recursive form. Because, at first go, shortest path (dijkstra) was implemented using recursion and it was taking infinite time to get results, for instance, while parsing 18000+ points, the recursion wouldn't be able to parse 500+ points in about 5 minutes, whereas changing to iterative form immediately improved the timing. Please note that recursion was giving correct results as it was tested pass on smaller total number of data points (7) by using a micro-version of carefully selected data points in the given dataset
Ubuntu OS, Visual Studio Code, Bazel, Google Tests, OpenCV, CMake
-
What is the runtime of your algorithm? O(n) as the function iterates over all the elements in the unordered map
-
(Optional) Can you do it faster than
O(n)
? TBD
Query | time spent |
---|---|
stAR | 3 ms |
holy name | 6 ms |
washington | 8 ms |
Current limitations: when I enter "school" then following errors comes:
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoi
Aborted (core dumped)
- time complexity: upto linear
- method: iterate through the entire map and at each iteration compare the database name (if name exists) withe the input name
Query | time spent |
---|---|
National Schools | 2 ms |
Arco | 3 ms |
Ramen KenJo | 0 ms |
- time complexity: O(mn) where m and n are the lengths of the two given strings
- method:
- Initially recursive method was tried, but it was too slow, hence, tabulation method was chosen
- we always try to find min of three operations i.e. replacement or insertion or deletion
Query | prompt | time spent | reply |
---|---|---|---|
help | Shell | 4 ms | y |
please | Chase | 4 ms | y |
thanks | Chase | 3 ms | n |
- time complexity: O(n) where n are the number of elements in the database, please note that insertion in unordered_map is O(constant) complexity
- method:
- data structure: unordered set was used to store the categories, as unordered map is implemented in stl with a characteristics that it doesn't stores duplicates
- we always try to find min of three operations i.e. replacement or insertion or deletion
- time complexity: O(n) as we perform 2 separate instances of parsing the entire database, once to construct the categories list and second time to parse each location for matching attribute same as category
- method: we parse the entire database leading to time complexity of O(n) and for each element, we perform reg_ex search that has a complexity of O(n)
- time complexity: total time complexity is O(nm) where n is the length of the database and m is the length of the string being matched
-
time complexity: O((|E|+|V|)log|V|) where E is the number of edges and V is the number of vertices, this time complexity is applicable since priority queue (min heap) was implemented
-
method:
- this algo. calculates the
- After not finding any way to work with std::string as indexes, since, I wasn't able to peek into the unordered_map in vsocde using gdb and within debug console, i took inspiration from this code https://github.com/ee538/final-project-Eric-Zheng29 where the author has converted all the stringnodes to ints and vice-versa and therefore used a simpler dat structure like a vector to peek into individiual values at debug time as well set break-points.
- Furthermore, i looked at the code to figure out how to draw the path https://github.com/dlwsdqdws/Trojan-Map/blob/master/src/lib/trojanmap.cc
-
data structures:
- priority queue with a pair of distance (first element) and node name (std::string)
- unordered_map (acting like a vector of vectors of strings) with nodename (std::string) as the key and vector of strings as the value to store the path for every other node from the source node
- time complexity: O(V*E) as we perform relaxation for all the edges n-1 times (one less than the number of nodes)
- method:
- loop-over V-1 times into E times for each Node/Vertex where E comes from adjacency list and check the distances
- each time among the given parents, a parent is found to be the min., we update the parent of that node into parent vector
Point A to Point B | Dijkstra | Bellman Ford | Bellman Ford optimized |
---|---|---|---|
Antioch Temple Baptist Church to Jesus of Nazareth Undenominational Church | 21ms | 7839 ms | ----------------------- |
Antioch Temple Baptist Church to Divine Providence Convent | 25ms | 8879 ms | |
Foshay Learning Center to Holy Name School | 34ms | 6569 ms | |
National Schools to Normandie Elementary School | 108ms | 8185 ms | |
Saint Agnes Elementary School to Saint Cecilia School | 204ms | 7758 ms | |
Saint Patrick School to Santa Barbara Avenue School | 225ms | 9653 ms | |
Trinity Elementary School to Twenty-Eight Street Elementary School | 40ms | 8936 ms | |
Twenty-Fourth Street Elementary School to Vermont Elementary School | 50ms | 9465 ms | |
Wadsworth Elementary School to Washington Boulevard School | 253ms | 10058 ms | |
West Vernon Elementary School to Jefferson Senior High School | 158ms | 9162 ms | |
Dockweller Station Los Angeles Post Office to Saint Marks Lutheran Church | 24ms | 8075 ms | |
Saint Patricks Catholic Church to Westvern Station Los Angeles Post Office | 203ms | 9745 ms | |
Pilgrim Congregational Church to Second Baptist Church | 215 ms | 10613 ms | |
United Church of Christ Scientist to American Hungarian Baptist Church | 263ms | 10658 ms | |
Grace Lutheran Church to Korean Presbyterian Church (나성한인연합장로교회) | 105ms | 9193 ms |
- time complexity: O(V + E)
- method:
- iterative approach adopted instead of recursion
- start with any node, this algo. is agnostic of teh starting node
- mark the node as visited and for the starting node, mark its parents as -1 (indicating its the starting node)
- enlist all its neighbours
- visit each neighbor, if the neighbor is unvisited, then record its ID and its parent ID in the stack
- if the neighbor is visited, then check, if neighbor ID's match the ID of the parent, if yes, its a cycle
- algo. works for disconnected graph as well, as we traverse all the nodes
- sources: https://stackoverflow.com/questions/31532887/detecting-cycle-in-an-undirected-graph-using-iterative-dfs
Input: square = {-118.299, -118.264, 34.032, 34.011}
Output: true
Input: square = {-118.290, -118.289, 34.030, 34.020}
Output: false
Input: square = {-118.284007, -118.2819, 34.0366, 34.035}
Output: true
Input: square = {-118.290, -118.288, 34.030, 34.0285}
Output: false
Input: square = {-118.290, -118.288, 34.030, 34.0255}
Output: true
- time complexity:
- data structure:
- a special stack that stored a pair where first element was
true
for a parent andfalse
for a child
- a special stack that stored a pair where first element was
- method:
- used iterative method for implementation
- In order to construct the postOrder list you need to know the time when your algorithm has finished processing the last child of node k.
- One way to figure out when you have popped the last child off the stack is to put special marks on the stack to indicate spots where the children of a particular node are starting. You could change the type of your dfs stack to vector<pair<bool,int>>. When the bool is set to true, it indicates a parent; false indicates a child.
- When you pop a "child pair" (i.e. one with the first member of the pair set to false) off the stack, you run the code that you currently have, i.e. push all their children onto the stack with your for loop. Before entering the for loop, however, you should push make_pair(true, node) onto the stack to mark the beginning of all children of this node.
- When you pop a "parent pair" off the stack, you push the parent index onto the postOrder, and move on:
- Only Directed Acyclic Graph have a valid topological ordering.
- Time Complexity is O(V+E), where V is the no. of Nodes/vertices and E is the number of Edges
- Tarjan's strongly connected component algo. can be used to find out id the DAG has a cycle
- Every Tree has a topological ordering since trees do not have any cycles
- One way to build the topological sort is to start from the leaves and list all the leaves first followed by their parents and traverse upto the root
- For a graph with recursion,
- pick an unvisited node,
- beginning with the selected node, do a Depth First Search (DFS) exploring only unvisited node
- on the recursive callback of the DFS, add the current node to the topological ordering in the reverse order
- references: https://github.com/williamfiset/Algorithms/blob/master/slides/graphtheory/graph_theory_algorithms.pdf
Input:
location_names = {"Foshay Learning Center", "Holy Name School", "National Schools", "Normandie Elementary School", "Saint Agnes Elementary School"}
dependencies = {{"Saint Agnes Elementary School","Foshay Learning Center"}, {"Foshay Learning Center","Holy Name School"}, {"Foshay Learning Center","National Schools"}, {"Saint Agnes Elementary School","National Schools"},{"National Schools","Normandie Elementary School"}}
Output: {"Saint Agnes Elementary School", "Foshay Learning Center","Holy Name School", "National Schools", "Normandie Elementary School"}
location_names = {"Saint Cecilia School", "Saint Patrick School", "Santa Barbara Avenue School", "Trinity Elementary School", "Twenty-Eight Street Elementary School", "Vermont Elementary School", "Wadsworth Elementary School"};
dependencies = {{"Saint Cecilia School","Saint Patrick School"}, {"Santa Barbara Avenue School","Saint Patrick School"}, {"Trinity Elementary School","Saint Patrick School"}, {"Twenty-Eight Street Elementary School","Saint Patrick School"},{"Twenty-Eight Street Elementary School","Vermont Elementary School"}, {"Wadsworth Elementary School","Vermont Elementary School"}}
Output: {"Wadsworth Elementary School", "Twenty-Eight Street Elementary School", "Vermont Elementary School", "Trinity Elementary School", "Santa Barbara Avenue School", "Saint Cecilia School", "Saint Patrick School"}
- time complexity:
- outerloop: till the time we continue to find improvement, can be exponential to subquadratic (source: Tim Roughgarden, )
- innerloop: replace
each
edge * with allother
edges in a iteration (the two innerfor
loops) equalling to n*(n-3)/2 or O(n2) for inner loop
Number of Places | Bruteforce | Bruceforce with backtracking | 2-opt |
---|---|---|---|
5 | 0 ms | 0 ms | 0 ms |
6 | 1 ms | 0 ms | 1 ms |
7 | 21 ms | 5 ms | 0 ms |
8 | 788 ms | 27 ms | 2 ms |
- start with a small testing set, since its easier to debug, for instance, in the travelling salesman problem with 2-opt, the algorithm was going into infinite loop and the distance was getting negative, even though the implementation was borrowed from wikipedia, after beginning with a 4 locations, the bug was found out
inability to peek inside unordered_map, unordered_set, ordered_map, ordered_set inside debug console using gdb
- while using vscode with gdb, many attempts were made to to see the contents inside the data structures like unordered_map and ordered_map. Though the run and debug window shows the inside contents, but restricts it to 500 entries, that was insufficient at one time, after trying out various ways as listed here: https://stackoverflow.com/questions/40633787/cannot-evaluate-function-may-be-in-lined-error-in-gdb-for-stl-template-cont?noredirect=1&lq=1 and https://stackoverflow.com/questions/40173061/gdb-stl-functions-still-show-as-inlined-after-disabling-optimizations and microsoft/vscode-cpptools#69, still I couldn't peek inside from debug console. For instance, data("2613081566") would give a prompt of "Cannot evaluate function -- may be inlined", thus i have to use traditional methods like having debug variables etc..
- tried this approach https://stackoverflow.com/questions/1740858/how-to-create-conditional-breakpoint-with-stdstring to edit break point to set for a speciifc string value, but it didn't work