The seasonal formation of martian surface frost provides the current best, global, multi-scale observational dataset for tracking the present-day volatile cycle on Mars. However, past Mars frost cycle studies have been limited by complete reliance on humans to manually parse and integrate large amounts of data from multiple, disparate observational records with scales varying from kilometers to meters. By bringing together physical and data scientists, we are creating a more systematic, unified approach combining visible, thermal, and spectral observations to generate a first-of-kind global frost formation map. Our initial steps include training a machine learning classification model to detect key frost-relevant surface features in visible imagery. Generating statistically robust training and evaluation datasets was critical and required a highly iterative annotation process, made more efficient with the use of Labelbox. In this presentation, we will outline the nuances of the frost detection practices of planetary scientists, how they were captured through iterative discussion and comparison with our human-generated labels, and finally how we evaluated and then improved the efficacy of our training set. Our lessons learned will apply directly to many image-based machine learning detection problems in physical science that must stand up to the rigor of later inference. This work is supported under the SUDS strategic initiative, which aims to create a community of practice involving collaborations between physical and data scientists.