In the talk we review the need to revisit performance evaluation in Machine Learning, as long as the existing mainstream options (accuracy, f-measure, MSE/MAE) provide a too narrow insight on method performance. Some of the topics discussed in the talk relate to the interpretation and modelling of data in a dataset, including multiple ground truth, classes of equivalence, and area-based interpretation of input population. A later part of the talk reviews alternatives to accuracy and f-measure in literature, mostly leaning towards inclusion of explainability or truthworthiness.