pykegg package¶
Submodules¶
pykegg.KGML_graph module¶
- class pykegg.KGML_graph.KGML_graph(path=None, pid=None)¶
Bases:
object
KGML graph object.
- get_coords()¶
Transform coords positions to edge DataFrame.
- get_edges(add_group=False)¶
Get edges DataFrame of the KGML graph.
- get_graph(layout='native', add_group=False)¶
Get igraph object of the KGML graph.
Parameters:¶
- layoutstr
layout of the graph. If native, the original layout of the KGML file is used.
- get_nodes(node_x_nudge=5, node_y_nudge=5, append_pathway_name=True)¶
Get nodes DataFrame of the KGML graph.
pykegg.utils module¶
- pykegg.utils.append_colors(node_df, candidate, new_column_name='color', candidate_column='graphics_name', delim=',', true_color='#ff0000', false_color='#ffffff')¶
Append discrete colors to the node_df based on intersection with candidate ID list.
Parameters:¶
- node_df: DataFrame
node data obtained by get_nodes().
- candidate: list
list of candidate IDs.
- new_column_name: str
the name of the new column.
- candidate_column: str
the column in node_df specifying candidate IDs.
- delim: str
the delimiter of the node IDs. Typically “,” for graphics_name, and “ “ for name.
- true_color: str
the color of the candidate nodes.
- false_color: str
the color of the non-candidate nodes.
- pykegg.utils.append_colors_continuous_values(node_df, lfc_dict, node_name_column='graphics_name', new_color_column='color', delim=',', colors=None, two_slope=True, center_value='median', orig_value=None, fix_min=None, fix_max=None, fix_center=None)¶
Append continuous colors to the node_df based on the values in dict.
Parameters:¶
- node_df: DataFrame
node data obtained by get_nodes().
- lfc_dict: dict
dict of values.
- node_name_column: str
the column in node_df specifying node IDs.
- new_color_column: str
the name of the new column.
- delim: str
the delimiter of the node IDs. Typically “,” for graphics_name, and “ “ for name.
- colors: list
the colors to be used. Default is [“#0000ff”, “#ffffff”, “#ff0000”].
- two_slope: bool
whether to use two-slope color scheme. Default is True.
- center_value: str or float
the center value of the color scheme. Default is “median”.
- orig_value: str
If specified, append the values used to compute color in the DataFrame.
- fix_min: float
fixed minimum value to calculate color
- fix_max: float
fixed maximum value to calculate color
- fix_center: float
fixed center value to calculate color
- pykegg.utils.append_legend(image, min_value=-2, max_value=2, center_value=0, two_slope=True, colors=None, width=1, height=0.6, bottom=0.8, pos='topright', label='Label')¶
Add specified legend to image array
Parameters:¶
- image: numpy array
Image numpy array
- min_value: float
Minimum value of the color bar
- max_value: float
Maximum value of the color bar
- center_value: float
Center value of the color bar
- two_slope: bool
If True, use two slope color bar
- colors: list
List of colors, if None, use blue-white-red color bar
- width: float
Width of the color bar
- height: float
Height of the color bar
- bottom: float
Bottom position of the color bar
- pos: str
Position specification, “topright”, “bottomright”, “bottomleft”, “topleft”
- label: str
Label of the color bar
- pykegg.utils.check_cache(response)¶
Check if response is from cache.
Parameters:¶
- response: requests.Response
Response object.
- pykegg.utils.convert_id(x, c_dic, first_only=True)¶
convert ID based on dict
Parameters:¶
- x: str
node name
- c_dic: dict
dictionary obtained typically obtained by id_to_name_dict keys correspond to KEGG ID and values correspond to name
- first_only:
return only first string separated by space
- pykegg.utils.deseq2_raw_map(results_df, path=None, pid=None, node_name_column='graphics_name', delim=',', color_column='log2FoldChange', highlight_sig=False, highlight_color='#ff0000', highlight_padj_thresh=0.05, colors=None, two_slope=True, center_value=0, show_legend=True, legend_label=None, legend_position='topright', legend_width=1, legend_height=0.6, legend_bottom=0.8)¶
Plot PyDESeq2 results on KEGG pathway map
Parameters:¶
- results_df: pandas.DataFrame
PyDESeq2 results dataframe
- path: str
Path to the KEGG pathway map
- pid: str
KEGG pathway ID
- node_name_column: str
Column name of the node name
- delim: str
Delimiter of the node name
- color_column: str
Column name of the color
- highlight_sig: bool
If True, highlight significant genes
- highlight_color: str
Color of the highlight
- highlight_padj_thresh: float
P-value threshold for the highlight
- colors: list
List of colors, if None, use blue-white-red color bar
- two_slope: bool
If True, use two slope color bar
- center_value: float
Center value of the color bar
- show_legend: bool
If True, show legend
- legend_label: str
Label of the legend
- legend_position: str
Position of the legend
- legend_width: float
Width of the legend
- legend_height: float
Height of the legend
- legend_bottom: float
Bottom position of the legend
- pykegg.utils.hex2rgb(hex_str)¶
Convert hex string to rgb tuple.
Parameters:¶
- hex_str: str
hex string, e.g. “#ffffff”.
- pykegg.utils.id_to_name_dict(list_id='hsa', column=3, semicolon=True, comma=True)¶
Get KEGG ID to name dictionary.
Parameters:¶
- list_id: str
organism ID.
- column: int
Column to use for name.
- semicolon: bool
Whether to split by semicolon.
- comma: bool
Whether to split by comma.
- pykegg.utils.overlay(rects, kegg_map)¶
overlay two images with transparency
Parameters:¶
- rects: np.array
Image numpy array
- kegg_map: np.array
Image numpy array
- pykegg.utils.overlay_continuous_values_with_legend(node_df, value_dict, path=None, pid=None, node_name_column='graphics_name', delim=',', colors=None, legend_label='value', legend_position='topright', legend_width=1, legend_height=0.6, legend_bottom=0.8, transparent_colors=None, two_slope=True, center_value='median')¶
Obtain the raw image of pathway and color the nodes, return the overlaid image with legend.
Parameters:¶
- node_df: DataFrame
node data obtained by get_nodes().
- pykegg.utils.overlay_opencv_image(node_df, path=None, pid=None, fill_color='color', transparent_colors=None, highlight_nodes=None, highlight_color='#ff0000', highlight_expand=2)¶
Obtain the raw image of pathway and color the nodes.
Parameters:¶
- node_df: DataFrame
node data obtained by get_nodes().
- path: str
path to the image if already downloaded.
- pid: str
KEGG pathway identifier.
- fill_color: str
the column in node_df specifying color in HEX. If list is given, split the width according to the color number. Skip the node if None.
- transparent_color: list of str
specify which color to be transparent. If None, default [“#FFFFFF”, “#BFFFBF”, “#BFBFFF”] is used.
- highlight_nodes: str
the column in node_df specifying which nodes to be highlighted.
- hihglight_color: str
the color of the highlighted nodes.
- highlight_expand: int
the number of pixels to expand the highlighted nodes.
- pykegg.utils.parallel_edges(df, move_param=5)¶
Experimental function moving x and y positions if multiple edges are to be plotted in plotnine based on whether the y position is the same between two points
- pykegg.utils.parallel_edges2(df, move_param=5)¶
Experimental function moving x and y positions if multiple edges are to be plotted in plotnine based on degrees between points
- pykegg.utils.pathway_name_to_id_dict(list_id='hsa')¶
Get pathway name to ID dictionary.
Parameters:¶
- list_id: str
organism ID.
- pykegg.utils.return_color_bar(width=1, height=0.6, bottom=0.8, min_value=-2, max_value=2, two_slope=True, center_value=0, colors=None, label='Label')¶
Return color bar as a numpy array.
Parameters:¶
- width: int
width of the color bar.
- height: int
height of the color bar.
- bottom: int
bottom of the color bar.
- min_value: int
minimum value of the color bar.
- max_value: int
maximum value of the color bar.
- two_slope: bool
if True, use two slope norm.
- center_value: int
center value of the color bar.
- colors: list
list of colors.
- label: str
label of the color bar.
- pykegg.utils.return_segments(graph, node_df=None, edge_df=None)¶
Return edge dataframe to having xend and yend
Parameters:¶
- graph: KGML_graph
KGML_graph class object
- node_df: DataFrame
node data obtained by get_nodes().
- edge_df: DataFrame
edge data obtained by get_edges().
- pykegg.utils.shorten_end(row, pct=0.8, absolute=None)¶
shorten segments by moving yend and xend
Parameters:¶
- row: pd.Series
Series of edge data frame
- pct: float
scaling factor
- absolute: float
absolute distance to shorten
- pykegg.utils.visualize(pathway_name, genes, db=None, org=None, column_name='graphics_name', false_color='#707070', true_color='#FA8072', output=None)¶
Output pathway image based on pathway name and gene symbol list
Parameters:¶
- pathway_name: str
pathway name (not ID)
- genes: str
list of genes
- db: str
database name
- org: str
if not specified db, the parameter will be used to convert pathway name to ID
- column_name: str
column name to match for in node data
- true_color: str
HEX specifying color for matched nodes
- false_color: str
HEX specifying color for not matched nodes
- output: str
output image file, default to None, meaning return the Image
- pykegg.utils.visualize_gseapy(gsea_res, colors, pathway_name=None, pathway_id=None, org='hsa', column_name='graphics_name', false_color='#707070')¶
Visualize GSEApy results.
Parameters:¶
- gsea_res: GSEApy object or list of GSEApy objects
GSEApy results.
- colors: str or list of str
Colors to use for each gsea results.
- pathway_name: str
Pathway name.
- pathway_id: str
Pathway ID.
- org: str
KEGG organism ID.
- column_name: str
Column name to use for visualization.
- false_color: str
Color to use for false nodes.