使边缘颜色取决于散点图()图中圆圈的大小

问题描述 投票:0回答:1

我正在尝试绘制散点图。我尝试通过将 pandas.core.series.Series (outliner_colors) 传递到 scatter() 函数的 edgecolors 参数中,使边缘颜色依赖于气泡的大小。 (抱歉,如果某些术语错误,我对它还很陌生!)

我遇到的问题是,即使 bubble_size_filtered 小于 700,000,我仍然在图表中的圆圈上得到轮廓,尽管 RGBA 元组中有第四个“0”,我相信这应该使轮廓透明。有趣的是,当我将限制设置为 7,000,000(没有 bubble_sizes_filtered 高于此值)时,所有气泡都没有轮廓。所以我认为 (0,0,0,0) 元组正在努力消除轮廓,但由于某种原因,气泡大小的选择却没有。 bubble_size_filtered 是 pandas.core.series.Series (使用 type() 检查)。

我已经完成了

print(bubble_size_filtered)
并得到:

0        120.0
1       2000.0
2       5000.0
3       3000.0
4        360.0
         ...
2042      21.0
2044      15.0
2045      85.0
2046     100.0
2047      36.0

很明显有些值低于 700,000...我不知道发生了什么。

在这里查看(我认为!)相关的代码片段:

outliner_colors = [] for size in bubble_size_filtered: if size > 700000: # Append "black" if size > 700000 outliner_colors.append("black") else: # Append RGBA tuple (0, 0, 0, 0) if size <= 700000 outliner_colors.append((0, 0, 0, 0)) ax.scatter(xvar_filtered, yvar_filtered, s=scaled_dot_size, c=dot_colors, edgecolors=outliner_colors, alpha=0.5-large_ghosting_scale)

这是我的完整代码(如果有用的话) - 或者至少是绘图函数的代码

def plot_cdr_graph_without_mcsps(dataframe, xvar, yvar, bubble_size, removed_categories, threshold_bubble_size): # Convert 'bubble_size' to numeric dataframe[bubble_size] = pd.to_numeric(dataframe[bubble_size], errors="coerce") # Converting the yvar to numeric dataframe[yvar] = pd.to_numeric(dataframe[yvar], errors="coerce") # Generate data for the prices per credit, if the user has inputted the price as their yvar and tons purchased as the bubble size. if yvar == "price_usd": dataframe[yvar] = dataframe[yvar]/dataframe[bubble_size] # I am now replacing the inf values with a large number, before rounding them to that large number. When I say large number, I mean one with lots of digits. # We need to change this - I think I was wrong in thinking that the .33333333333s were being represented as infs. perhaps there is another reason we have an inf value. # Currently, I am just replacing the inf value with 1e10 which ruins the graph, (although it at least allows the graph to be plotted). # dataframe[yvar].replace([np.inf, -np.inf], 1e10, inplace=True) # dataframe[yvar] = dataframe[yvar].round() else: print("This function does not facilitate a scenario in which the yvar is not price!") # Filter out rows with 'None' values in 'xvar' dataframe = dataframe.dropna(subset=[xvar]) # Filter out rows with 'None' values in 'yvar' dataframe = dataframe.dropna(subset=[yvar]) # If the xvar variable is announcement date, then do this to format it correctly. if xvar == "announcement_date": dataframe[xvar] = pd.to_datetime(dataframe[xvar]) # change to datetime format. dataframe[xvar] = dataframe[xvar].dt.strftime('%Y-%m-%d %H:%M:%S') # Convert datetime to string dataframe[xvar] = dataframe[xvar].apply(lambda x: x[:10]) print(f"These are dates: {dataframe[xvar]}") # This is done to remove the nan values and replace with "Unspecified", which will be colour coded as grey in the colour dict. mask = pd.isnull(dataframe["method"]) dataframe.loc[mask, "method"] = "Unspecified" #Reset the index to make it equal to the rows in the dataframe. dataframe.reset_index(drop=True, inplace=True) # Filter out rows with excluded ('BECCS') method and non-None values in bubble_size and non-None values in yvar. This is designed to make all the columns the same length. mask = (~dataframe[xvar].isin(removed_categories)) & (~dataframe[xvar].isnull()) & (~dataframe[bubble_size].isnull()) & (dataframe[bubble_size] != 0) & (dataframe[bubble_size] >= threshold_bubble_size) xvar_filtered = dataframe.loc[mask, xvar] bubble_size_filtered = dataframe.loc[mask, bubble_size] yvar_filtered = dataframe.loc[mask, yvar] # Some info on what we are doing here ----------------------------------- # mask, eturns a pandas data series. This series has indicies and bools. The bools correspond to True and False values. Above I have reset the index of the dataframe to make it # correspond to the rows of the dataframe. You will see that the mask series has roughly 320 rows (21/02/2024). This is the number of rows with info on: # xvar, yvar, bubble_size. AND that is not removed_category OR below threshold_bubble_size. # The above could seem confusing because of the use of the ~ sign. This is responsible for acting as a Boolean logical operator negator in the pandas DataFrame. So it will FLIP the True and False Boolean values. # ----------------------------------------------------------------------- # Scale the size of all the dots by the number of tonnes purchased. scaling_factor = 0.12 scaled_dot_size = bubble_size_filtered * scaling_factor # Dot colours by CDR Methodology. This is designed to always correspond to CDR methodology, and not change as the axes of the graph change. cdr_method = dataframe.loc[mask, "method"] cdr_colors = {"Biochar": "black", "Enhanced Weathering": "blue", "Mineralization": "#987F18", "Biomass Removal": "#0a7d29", "DAC": "purple", "Biooil": "orange", "Direct Ocean Removal": "#55B7B4", "Microalgae": "#589F39", "Macroalgae": "lime", "Ocean Alkalinity Enhancement": "navy", "BECCS": "sienna", "Unspecified": "dimgrey"} # We need this at the end, because sometimes we change xvar to announcement_date or something. Therefore the mask won't work on rows with no listed "method". I will colour this the same as Unspecified. dot_colors = [cdr_colors[method] for method in cdr_method] # This clever bit of code is designed to scale the transparency of the bubble to the size of the bubble - larger bubbles that cover others will therefore be less obstructive. We have hard-coded 3000000 as the upper limit for tons_purchased, as we know the largest Microsoft one in the databased is under this. May need to change in future! large_ghosting_scale = 0.4*(bubble_size_filtered/3000000) print(f"ifdhbjsdkf: {type(large_ghosting_scale)}") # This bit of code is supposed to define whether a bubble has an outline or not. # Still more work needed here. I don't think the size number here corresponds to the bubble size really. #outliner_colors = bubble_size_filtered.apply(lambda size: "black" if size > 700000 else (0, 0, 0, 0)) # This is a very obscure bit of code to find. So you need to actually use an RGBA colour code. This is a tuple of three or four numbers. (R, G, B, A). Red, Green, Blue components along with alpha for transparency. For some reason they were not letting me use "none" for no outline. # Assuming bubble_size_filtered is your pandas.core.series.Series outliner_colors = [] # Iterate through the values in bubble_size_filtered for size in bubble_size_filtered: if size > 700000: # Append "black" if size > 700000 outliner_colors.append("black") else: # Append RGBA tuple (0, 0, 0, 0) if size <= 700000 outliner_colors.append((0, 0, 0, 0)) # This clever little function is responsible for taking in strings which correspond to the human input (which is the collumns in the csv, and changing them to labels) def human_input_to_labels(input): if input == "method": return "CDR Method" elif input == "tons_purchased": return "No. tCO2c in order" elif input == "price_usd": return "Price per credit (USD/tCO2c)" elif input == "announcement_date": return "Date of Purchase Order Announcement" else: pass # PLOTTING FUNCTION Plot the graph only if both series have the same length if len(xvar_filtered) == len(yvar_filtered) == len(bubble_size_filtered): text_color = "#E5E5E5" background_colour = "#565656" chart_colour = "#C8C8C8" axes_widths = 1.2 fig, ax = plt.subplots() ax.scatter(xvar_filtered, yvar_filtered, s=scaled_dot_size, c=dot_colors, edgecolors=outliner_colors, alpha=0.5-large_ghosting_scale) # Note that the xvar_filtered, yvar_filtered, scaled_dot_size are pandas.core.series.Series's while the dot_colors is a list. The large_ghosting_scale is also a pandas.core.series.Series. ax.set_xlabel(human_input_to_labels(xvar), fontweight="bold", fontname="Gill Sans MT", color=text_color) ax.set_ylabel(human_input_to_labels(yvar), fontweight="bold", fontname="Gill Sans MT", color=text_color) ax.set_title("CDR Graph: Market Carbon Credit Prices vs. CyanoCapture\nMinimum Credit Selling Prices", fontweight="bold", fontname="Gill Sans MT", color=text_color, fontsize=15) ax.tick_params(axis="x", colors=text_color, labelrotation=0, labelsize=8) # Note that color will only change tick colour, while colors will change both tick and label colours. ax.tick_params(axis="y", colors=text_color) ax.grid(True, color="black", alpha=0.2) ax.spines['bottom'].set_linewidth(axes_widths) # Set thickness of the bottom axis ax.spines["bottom"].set_color(text_color) ax.spines['left'].set_linewidth(axes_widths) # Set thickness of the left axis ax.spines["left"].set_color(text_color) ax.spines['top'].set_linewidth(0) # Set thickness of the top axis ax.spines['right'].set_linewidth(0) # Set thickness of the right axis ax = plt.gca() for tick in ax.get_xticklabels(): tick.set_fontweight('bold') for tick in ax.get_yticklabels(): tick.set_fontweight('bold') ax.invert_xaxis() # For some reason when time on x axis, wrong way round. This fixes that ax.axhline(y=0, color=background_colour, linestyle='--', linewidth=1) # Adding a line at the y axis max_credit_price = np.nanmax(yvar_filtered) ax.set_ylim(-100, max_credit_price) ax.set_facecolor(chart_colour) ax.xaxis.labelpad = 55 # This is designed to space out the x axis label from the x axis data labels: plt.subplots_adjust(bottom=0.4) fig.patch.set_facecolor(background_colour) # HexDec code for dark grey. # This is designed to space out the labels along the x axis: if xvar == "announcement_date": custom_tick_positions = range(0, len(xvar_filtered), 8) # Example: Tick every 2 units ax.xaxis.set_major_locator(FixedLocator(custom_tick_positions)) else: pass # LEGEND FORMATTING ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ for method, color in cdr_colors.items(): plt.scatter([], [], color=color, label=method) # Customize legend plt.legend(title='CDR Method', loc='upper left', fontsize='small') #This is responsible for putting the legend as a horizontally inclined rectange at the bottom of the plot. Change the second bbox argument to change the % below the plot the legened is. plt.legend(loc='lower right', bbox_to_anchor=(0.5, -0.60), ncol=3, fancybox=True) # ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ plt.show() else: print("Error: Length of 'xvar_filtered', 'bubble_size_filtered' and 'yvar_filtered' are not the same.")
    
pandas dataframe list matplotlib scatter-plot
1个回答
0
投票
可能是您的

edgecolors=

 alpha 值被您的 
alpha=
 参数覆盖。尝试删除 
alpha=
 参数或将其设置为 
None
 - 这应该允许 
edgecolor=
 来控制 alpha。

© www.soinside.com 2019 - 2024. All rights reserved.