API Reference

pykegg

class pykegg.KGML_graph(path=None, pid=None)

KGML graph object.

get_coords()

Transform coords positions to edge DataFrame.

get_edges(add_group=False)

Get edges DataFrame of the KGML graph.

get_graph(layout='native', add_group=False)

Get igraph object of the KGML graph.

Parameters:

layoutstr

layout of the graph. If native, the original layout of the KGML file is used.

get_nodes(node_x_nudge=5, node_y_nudge=5, append_pathway_name=True)

Get nodes DataFrame of the KGML graph.

pykegg.utils.append_colors(node_df, candidate, new_column_name='color', candidate_column='graphics_name', delim=',', true_color='#ff0000', false_color='#ffffff')

Append discrete colors to the node_df based on intersection with candidate ID list.

Parameters:

node_df: DataFrame

node data obtained by get_nodes().

candidate: list

list of candidate IDs.

new_column_name: str

the name of the new column.

candidate_column: str

the column in node_df specifying candidate IDs.

delim: str

the delimiter of the node IDs. Typically “,” for graphics_name, and “ “ for name.

true_color: str

the color of the candidate nodes.

false_color: str

the color of the non-candidate nodes.

pykegg.utils.append_colors_continuous_values(node_df, lfc_dict, node_name_column='graphics_name', new_color_column='color', delim=',', colors=None, two_slope=True, center_value='median', orig_value=None, fix_min=None, fix_max=None, fix_center=None)

Append continuous colors to the node_df based on the values in dict.

Parameters:

node_df: DataFrame

node data obtained by get_nodes().

lfc_dict: dict

dict of values.

node_name_column: str

the column in node_df specifying node IDs.

new_color_column: str

the name of the new column.

delim: str

the delimiter of the node IDs. Typically “,” for graphics_name, and “ “ for name.

colors: list

the colors to be used. Default is [“#0000ff”, “#ffffff”, “#ff0000”].

two_slope: bool

whether to use two-slope color scheme. Default is True.

center_value: str or float

the center value of the color scheme. Default is “median”.

orig_value: str

If specified, append the values used to compute color in the DataFrame.

fix_min: float

fixed minimum value to calculate color

fix_max: float

fixed maximum value to calculate color

fix_center: float

fixed center value to calculate color

pykegg.utils.append_legend(image, min_value=-2, max_value=2, center_value=0, two_slope=True, colors=None, width=1, height=0.6, bottom=0.8, pos='topright', label='Label')

Add specified legend to image array

Parameters:

image: numpy array

Image numpy array

min_value: float

Minimum value of the color bar

max_value: float

Maximum value of the color bar

center_value: float

Center value of the color bar

two_slope: bool

If True, use two slope color bar

colors: list

List of colors, if None, use blue-white-red color bar

width: float

Width of the color bar

height: float

Height of the color bar

bottom: float

Bottom position of the color bar

pos: str

Position specification, “topright”, “bottomright”, “bottomleft”, “topleft”

label: str

Label of the color bar

pykegg.utils.check_cache(response)

Check if response is from cache.

Parameters:

response: requests.Response

Response object.

pykegg.utils.convert_id(x, c_dic, first_only=True)

convert ID based on dict

Parameters:

x: str

node name

c_dic: dict

dictionary obtained typically obtained by id_to_name_dict keys correspond to KEGG ID and values correspond to name

first_only:

return only first string separated by space

pykegg.utils.deseq2_raw_map(results_df, path=None, pid=None, node_name_column='graphics_name', delim=',', color_column='log2FoldChange', highlight_sig=False, highlight_color='#ff0000', highlight_padj_thresh=0.05, colors=None, two_slope=True, center_value=0, show_legend=True, legend_label=None, legend_position='topright', legend_width=1, legend_height=0.6, legend_bottom=0.8)

Plot PyDESeq2 results on KEGG pathway map

Parameters:

results_df: pandas.DataFrame

PyDESeq2 results dataframe

path: str

Path to the KEGG pathway map

pid: str

KEGG pathway ID

node_name_column: str

Column name of the node name

delim: str

Delimiter of the node name

color_column: str

Column name of the color

highlight_sig: bool

If True, highlight significant genes

highlight_color: str

Color of the highlight

highlight_padj_thresh: float

P-value threshold for the highlight

colors: list

List of colors, if None, use blue-white-red color bar

two_slope: bool

If True, use two slope color bar

center_value: float

Center value of the color bar

show_legend: bool

If True, show legend

legend_label: str

Label of the legend

legend_position: str

Position of the legend

legend_width: float

Width of the legend

legend_height: float

Height of the legend

legend_bottom: float

Bottom position of the legend

pykegg.utils.hex2rgb(hex_str)

Convert hex string to rgb tuple.

Parameters:

hex_str: str

hex string, e.g. “#ffffff”.

pykegg.utils.id_to_name_dict(list_id='hsa', column=3, semicolon=True, comma=True)

Get KEGG ID to name dictionary.

Parameters:

list_id: str

organism ID.

column: int

Column to use for name.

semicolon: bool

Whether to split by semicolon.

comma: bool

Whether to split by comma.

pykegg.utils.overlay(rects, kegg_map)

overlay two images with transparency

Parameters:

rects: np.array

Image numpy array

kegg_map: np.array

Image numpy array

pykegg.utils.overlay_continuous_values_with_legend(node_df, value_dict, path=None, pid=None, node_name_column='graphics_name', delim=',', colors=None, legend_label='value', legend_position='topright', legend_width=1, legend_height=0.6, legend_bottom=0.8, transparent_colors=None, two_slope=True, center_value='median')

Obtain the raw image of pathway and color the nodes, return the overlaid image with legend.

Parameters:

node_df: DataFrame

node data obtained by get_nodes().

pykegg.utils.overlay_opencv_image(node_df, path=None, pid=None, fill_color='color', transparent_colors=None, highlight_nodes=None, highlight_color='#ff0000', highlight_expand=2)

Obtain the raw image of pathway and color the nodes.

Parameters:

node_df: DataFrame

node data obtained by get_nodes().

path: str

path to the image if already downloaded.

pid: str

KEGG pathway identifier.

fill_color: str

the column in node_df specifying color in HEX. If list is given, split the width according to the color number. Skip the node if None.

transparent_color: list of str

specify which color to be transparent. If None, default [“#FFFFFF”, “#BFFFBF”, “#BFBFFF”] is used.

highlight_nodes: str

the column in node_df specifying which nodes to be highlighted.

hihglight_color: str

the color of the highlighted nodes.

highlight_expand: int

the number of pixels to expand the highlighted nodes.

pykegg.utils.parallel_edges(df, move_param=5)

Experimental function moving x and y positions if multiple edges are to be plotted in plotnine based on whether the y position is the same between two points

Parameters:
  • df (pd.DataFrame) – data frame returned by return_segments

  • move_params (float or int) – parameter to control edge nudge

pykegg.utils.parallel_edges2(df, move_param=5)

Experimental function moving x and y positions if multiple edges are to be plotted in plotnine based on degrees between points

Parameters:
  • df (pd.DataFrame) – data frame returned by return_segments

  • move_params (float or int) – parameter to control edge nudge

pykegg.utils.pathway_name_to_id_dict(list_id='hsa')

Get pathway name to ID dictionary.

Parameters:

list_id: str

organism ID.

pykegg.utils.return_color_bar(width=1, height=0.6, bottom=0.8, min_value=-2, max_value=2, two_slope=True, center_value=0, colors=None, label='Label')

Return color bar as a numpy array.

Parameters:

width: int

width of the color bar.

height: int

height of the color bar.

bottom: int

bottom of the color bar.

min_value: int

minimum value of the color bar.

max_value: int

maximum value of the color bar.

two_slope: bool

if True, use two slope norm.

center_value: int

center value of the color bar.

colors: list

list of colors.

label: str

label of the color bar.

pykegg.utils.return_segments(graph, node_df=None, edge_df=None)

Return edge dataframe to having xend and yend

Parameters:

graph: KGML_graph

KGML_graph class object

node_df: DataFrame

node data obtained by get_nodes().

edge_df: DataFrame

edge data obtained by get_edges().

pykegg.utils.shorten_end(row, pct=0.8, absolute=None)

shorten segments by moving yend and xend

Parameters:

row: pd.Series

Series of edge data frame

pct: float

scaling factor

absolute: float

absolute distance to shorten

pykegg.utils.visualize(pathway_name, genes, db=None, org=None, column_name='graphics_name', false_color='#707070', true_color='#FA8072', output=None)

Output pathway image based on pathway name and gene symbol list

Parameters:

pathway_name: str

pathway name (not ID)

genes: str

list of genes

db: str

database name

org: str

if not specified db, the parameter will be used to convert pathway name to ID

column_name: str

column name to match for in node data

true_color: str

HEX specifying color for matched nodes

false_color: str

HEX specifying color for not matched nodes

output: str

output image file, default to None, meaning return the Image

pykegg.utils.visualize_gseapy(gsea_res, colors, pathway_name=None, pathway_id=None, org='hsa', column_name='graphics_name', false_color='#707070')

Visualize GSEApy results.

Parameters:

gsea_res: GSEApy object or list of GSEApy objects

GSEApy results.

colors: str or list of str

Colors to use for each gsea results.

pathway_name: str

Pathway name.

pathway_id: str

Pathway ID.

org: str

KEGG organism ID.

column_name: str

Column name to use for visualization.

false_color: str

Color to use for false nodes.