Skip to content

Commit a9687fb

Browse files
authored
Merge pull request #17 from dalager/general-improvements
General improvements
2 parents b8762dd + 9d65405 commit a9687fb

14 files changed

+473
-116
lines changed

.gitignore

+3-1
Original file line numberDiff line numberDiff line change
@@ -163,4 +163,6 @@ cython_debug/
163163
venv/Lib/site-packages
164164
venv
165165
*.xlsx
166-
*.xlsx.png
166+
*.png
167+
combined_project.py
168+
plotsettings.json

README.md

+30-26
Original file line numberDiff line numberDiff line change
@@ -58,20 +58,37 @@ pip install -e .
5858
## Usage
5959

6060
```bash
61-
python -m graphedexcel <path_to_excel_file> [--verbose] [--no-visualize] [--keep-direction] [--open-image]
61+
python -m graphedexcel <path_to_excel_file>
6262
```
6363

64-
Depending on the size of the spreadsheet you might want to adjust the plot configuration in the code to to make the graph more readable (remove labels, decrease widths and sizes etc) - you can find the configuration in [graph_visualizer.py](src/graphedexcel/graph_visualizer.py) with settings for small, medium and large graphs. You can adjust the configuration to your needs - but this only working if you run from source.
64+
### Parameters from `--help`
6565

66-
### Arguments
67-
68-
`--verbose` will dump formula cell contents during (more noisy)
69-
70-
`--no-visualize` will skip the visualization step and only print the summary (faster)
71-
72-
`--keep-direction` will keep the direction of the graph as it is in the excel file, otherwise it will be simplified to an undirected graph (slower)
73-
74-
`--open-image` will open the generated image in the default image viewer (only on Windows)
66+
```
67+
usage: graphedexcel [-h] [--remove-unconnected] [--as-directed-graph] [--no-visualize]
68+
[--layout {spring,circular,kamada_kawai,shell,spectral}] [--config CONFIG]
69+
[--output-path OUTPUT_PATH] [--open-image]
70+
path_to_excel
71+
72+
Process an Excel file to build and visualize dependency graphs.
73+
74+
positional arguments:
75+
path_to_excel Path to the Excel file to process.
76+
77+
options:
78+
-h, --help show this help message and exit
79+
--remove-unconnected, -r
80+
Remove unconnected nodes from the dependency graph.
81+
--as-directed-graph, -d
82+
Treat the dependency graph as directed.
83+
--no-visualize, -n Skip the visualization of the dependency graph.
84+
--layout,-l {spring,circular,kamada_kawai,shell,spectral}
85+
Layout algorithm for graph visualization (default: spring).
86+
--config CONFIG, -c CONFIG
87+
Path to the configuration file for visualization. See README for details.
88+
--output-path OUTPUT_PATH, -o OUTPUT_PATH
89+
Specify the output path for the generated graph image.
90+
--open-image Open the generated image after visualization.
91+
```
7592

7693
## Sample output
7794

@@ -136,7 +153,7 @@ base_graph_settings = {
136153

137154
# Sized-based settings for small, medium, and large graphs
138155
small_graph_settings = {
139-
"with_labels": False,
156+
"with_labels": False,
140157
"alpha": 0.8}
141158

142159
medium_graph_settings = {
@@ -174,7 +191,7 @@ To override these settings, create a JSON file (e.g., graph_settings.json) with
174191
To use the custom configuration, pass the path to the JSON file as an argument to the script:
175192

176193
```bash
177-
python -m graphedexcel <path_to_excel_file> --config <path to grap_settings.json>
194+
python -m graphedexcel myexcel.xlsx --config graph_settings.json
178195
```
179196

180197
This will render the graph using the custom settings defined in the JSON file.
@@ -186,16 +203,3 @@ Just run pytest in the root folder.
186203
```bash
187204
pytest
188205
```
189-
190-
## Contribute
191-
192-
Feel free to contribute by opening an issue or a pull request.
193-
194-
You can help with the following, that I have thought of so far:
195-
196-
- Add more tests
197-
- Improve the code
198-
- Add more features
199-
- Improve the visualization and the ease of configuration
200-
- Add more examples
201-
- Add more documentation

Packaging_notes.md docs/Packaging_notes.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,23 @@
11
# Packaging notes
22

3+
Notes on packaging and distributing the package.
4+
35
## Test PyPi
46

57
```bash
68
rimraf .\dist\; python -m build; python -m twine upload --repository pypi dist/* --verbose
7-
```
89

910
python -m build
1011

1112
python -m twine upload --repository testpypi dist/\* --verbose
1213

13-
````
14+
```
1415

1516
## Installation
1617

1718
```bash
1819
pip install -i https://test.pypi.org/simple/ graphedexcel
19-
````
20+
```
2021

2122
## installation from local dist
2223

docs/ai_chat_context_creator.py

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
def combine_python_files(file_list, output_file):
2+
"""
3+
Combines multiple Python files into a single file with comments indicating
4+
the start and end of each original file.
5+
6+
Parameters:
7+
- file_list: List of Python file names to combine.
8+
- output_file: Name of the output file.
9+
"""
10+
with open(output_file, "w") as outfile:
11+
for fname in file_list:
12+
# Add a comment indicating the start of a file
13+
outfile.write(f"# --- Start of {fname} ---\n\n")
14+
with open(fname, "r") as infile:
15+
outfile.write(infile.read())
16+
outfile.write("\n")
17+
# Add a comment indicating the end of a file
18+
outfile.write(f"# --- End of {fname} ---\n\n")
19+
print(f"All files have been combined into {output_file}")
20+
21+
22+
if __name__ == "__main__":
23+
# Replace these with your actual file names
24+
python_files = [
25+
"src/graphedexcel/__main__.py",
26+
"src/graphedexcel/graphbuilder.py",
27+
"src/graphedexcel/graph_visualizer.py",
28+
"src/graphedexcel/graph_summarizer.py",
29+
"src/graphedexcel/excel_parser.py",
30+
]
31+
output_filename = "combined_project.py"
32+
combine_python_files(python_files, output_filename)

src/graphedexcel/__init__.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1 @@
1-
import sys
2-
from .graphbuilder import extract_formulas_and_build_dependencies
3-
1+
# package graphedexcel

src/graphedexcel/__main__.py

+133-23
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,146 @@
11
import os
22
import sys
3-
from .graphbuilder import extract_formulas_and_build_dependencies
3+
import argparse
4+
import logging
5+
from .graphbuilder import build_graph_and_stats
46
from .graph_summarizer import print_summary
57
from .graph_visualizer import visualize_dependency_graph
8+
import src.graphedexcel.logger_config # noqa
69

7-
if __name__ == "__main__":
8-
if len(sys.argv) > 1:
9-
path_to_excel = sys.argv[1]
10-
else:
11-
print("Please provide the path to the Excel file as an argument.")
12-
sys.exit(1)
10+
logger = logging.getLogger("graphedexcel.main")
11+
12+
13+
def parse_arguments():
14+
parser = argparse.ArgumentParser(
15+
prog="graphedexcel",
16+
description="Process an Excel file to build and visualize dependency graphs.",
17+
)
18+
19+
# Positional argument for the path to the Excel file
20+
parser.add_argument(
21+
"path_to_excel", type=str, help="Path to the Excel file to process."
22+
)
23+
24+
# Optional flags with shorthand aliases
25+
parser.add_argument(
26+
"--remove-unconnected",
27+
"-r",
28+
action="store_true",
29+
help="Remove unconnected nodes from the dependency graph.",
30+
)
31+
32+
parser.add_argument(
33+
"--as-directed-graph",
34+
"-d",
35+
action="store_true",
36+
help="Treat the dependency graph as directed.",
37+
)
38+
39+
parser.add_argument(
40+
"--no-visualize",
41+
"-n",
42+
action="store_true",
43+
help="Skip the visualization of the dependency graph.",
44+
)
45+
46+
parser.add_argument(
47+
"--layout",
48+
"-l",
49+
type=str,
50+
default="spring",
51+
choices=["spring", "circular", "kamada_kawai", "shell", "spectral"],
52+
help="Layout algorithm for graph visualization (default: spring).",
53+
)
54+
55+
parser.add_argument(
56+
"--config",
57+
"-c",
58+
type=str,
59+
help="Path to the configuration file for visualization. See README for details.",
60+
)
61+
62+
parser.add_argument(
63+
"--output-path",
64+
"-o",
65+
type=str,
66+
default=None,
67+
help="Specify the output path for the generated graph image.",
68+
)
69+
70+
parser.add_argument(
71+
"--open-image",
72+
action="store_true",
73+
help="Open the generated image after visualization.",
74+
)
75+
76+
return parser.parse_args()
1377

14-
# does the file exist?
78+
79+
def main():
80+
args = parse_arguments()
81+
82+
path_to_excel = args.path_to_excel
83+
84+
# Check if the file exists
1585
if not os.path.exists(path_to_excel):
16-
print(f"File not found: {path_to_excel}")
86+
logger.error(f"File not found: {path_to_excel}")
1787
sys.exit(1)
1888

19-
# Extract formulas and build the dependency graph
20-
dependency_graph, functions = extract_formulas_and_build_dependencies(path_to_excel)
89+
# Build the dependency graph and gather statistics
90+
dependency_graph, function_stats = build_graph_and_stats(
91+
path_to_excel,
92+
remove_unconnected=args.remove_unconnected,
93+
as_directed=args.as_directed_graph,
94+
)
95+
96+
# Print summary of the dependency graph
97+
print_summary(dependency_graph, function_stats)
98+
99+
if args.no_visualize:
100+
logger.info("Skipping visualization as per the '--no-visualize' flag.")
101+
sys.exit(0)
102+
103+
logger.info("Visualizing the graph of dependencies. (This might take a while...)")
104+
105+
# Determine layout
106+
layout = args.layout
21107

22-
print_summary(dependency_graph, functions)
108+
# Configuration path
109+
config_path = args.config
23110

24-
if "--no-visualize" not in sys.argv:
25-
print(
26-
"\033[1;30;40m\nVisualizing the graph of dependencies.\nThis might take a while...\033[0;37;40m\n" # noqa
27-
)
111+
# Determine output filename
112+
if args.output_path:
113+
filename = args.output_path
114+
else:
115+
# Create a default filename based on the Excel file name
116+
base_name = os.path.splitext(os.path.basename(path_to_excel))[0]
117+
filename = f"{base_name}_dependency_graph.png"
118+
119+
# Visualize the dependency graph
120+
visualize_dependency_graph(dependency_graph, filename, config_path, layout)
121+
122+
logger.info(f"Dependency graph image saved to {filename}.")
123+
124+
# Open the image file if requested
125+
if args.open_image:
126+
try:
127+
os.startfile(filename) # Note: os.startfile is Windows-specific
128+
except AttributeError:
129+
# For macOS and Linux, use 'open' and 'xdg-open' respectively
130+
import subprocess
131+
import platform
28132

29-
# if commandline argument --config is provided with a path to a JSON file, pass that path to the visualizer
133+
if platform.system() == "Darwin": # macOS
134+
subprocess.call(["open", filename])
135+
elif platform.system() == "Linux":
136+
subprocess.call(["xdg-open", filename])
137+
else:
138+
logger.warning("Unable to open the image automatically on this OS.")
30139

31-
if "--config" in sys.argv:
32-
config_index = sys.argv.index("--config")
33-
config_path = sys.argv[config_index + 1]
34-
visualize_dependency_graph(dependency_graph, path_to_excel, config_path)
35-
else:
36-
visualize_dependency_graph(dependency_graph, path_to_excel)
140+
141+
if __name__ == "__main__":
142+
try:
143+
main()
144+
except Exception as e:
145+
logger.exception("An unexpected error occurred:", e)
146+
sys.exit(1)

src/graphedexcel/excel_parser.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
from openpyxl.utils import get_column_letter, range_boundaries
22
import re
33
from typing import List, Tuple, Dict
4+
import logging
5+
6+
logger = logging.getLogger(__name__)
47

58
# Regex to detect cell references like A1, B2, or ranges like A1:B2
69
CELL_REF_REGEX = r"('?[A-Za-z0-9_\-\[\] ]+'?![A-Z]{1,3}[0-9]+(:[A-Z]{1,3}[0-9]+)?)|([A-Z]{1,3}[0-9]+(:[A-Z]{1,3}[0-9]+)?)" # noqa
710

11+
812
def extract_references(formula: str) -> Tuple[List[str], List[str], Dict[str, str]]:
913
"""
1014
Extract all referenced cells and ranges from a formula using regular expressions.
@@ -15,7 +19,7 @@ def extract_references(formula: str) -> Tuple[List[str], List[str], Dict[str, st
1519
1620
Returns:
1721
Tuple[List[str], List[str], Dict[str, str]]: A tuple containing lists of direct references,
18-
range references, and a dictionary of dependencies.
22+
range references, and a dictionary of dependencies.
1923
"""
2024
formula = formula.replace("$", "")
2125
matches = re.findall(CELL_REF_REGEX, formula)
@@ -42,6 +46,7 @@ def extract_references(formula: str) -> Tuple[List[str], List[str], Dict[str, st
4246

4347
return direct_references, range_references, dependencies
4448

49+
4550
def expand_range(range_reference: str) -> List[str]:
4651
"""
4752
Expand a range reference (e.g., 'A1:A3') into a list of individual cell references.
@@ -71,5 +76,3 @@ def expand_range(range_reference: str) -> List[str]:
7176
expanded_cells.append(cell_ref)
7277

7378
return expanded_cells
74-
75-

src/graphedexcel/graph_summarizer.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
from collections import Counter
2+
import networkx as nx
23

34

4-
def print_summary(graph, functionsdict):
5+
def print_summary(graph: nx.Graph, functionsdict: dict[str, int]) -> None:
56
"""
6-
Summarize a networkx DiGraph representing a dependency graph and print the most used functions in the formulas.
7+
Summarize a networkx DiGraph representing a dependency
8+
graph and print the most used functions in the formulas.
79
"""
810
strpadsize = 28
911
numpadsize = 5
1012

13+
print()
1114
print_basic_info(graph, strpadsize, numpadsize)
1215
print_highest_degree_nodes(graph, strpadsize, numpadsize)
1316
print_most_used_functions(functionsdict, strpadsize, numpadsize)
17+
print()
1418

1519

1620
def print_basic_info(graph, strpadsize, numpadsize):

0 commit comments

Comments
 (0)