Staff in the Biosciences and Computing Sciences Areas are invited to participate in a weeklong workshop focusing on machine learning in data science. The goal of the workshop, to be held April 2-6, is to build bridges between Biosciences and Computing Sciences through a common foundation in statistical computing.
The course is open to all staff in Biosciences and Computing Sciences, but perquisite training in basic Python, basic linear algebra, and basic to intermediate statistics is required. Read more about the prerequisites. Attendance is limited to 26 participants. All sessions will be held in Bdlg. 59, room 4102 at Berkeley Lab.
Course description
The first three days, April 2-4, will be training in machine learning led by The Data Incubator, a company known for training scientists for data science careers in industry. Topics covered will include: K-nearest neighbors; unsupervised learning; bias, variance and overfitting; scikit-learn workflow; learning and metrics; as well as linear and logistic regression.
In the first portion of the course, students will develop a series of models to predict a venue’s star rating from various features. Working from 100MB of real-world data, they will start with location-based models before building models based on other attributes of the venues. Finally, an ensemble model will blend the individual models into a final prediction of the venue’s popularity
The final two days of the week will be spent applying these new skills to solve problems faced by Berkeley Lab staff. Projects will be proposed by the participants and should be relevant to the mission of Berkeley Lab. One week prior to the hackathon, selected participants will receive a survey with a list of potential projects. Participants will be asked to rank each project and the organizing committee will match participants to projects.
Participants will be required to attend the entire week and participate fully in the hackathon and share their results with the organizing committee. Participants will have the opportunity to present their results to Biosciences and Computing Sciences leadership.
How to apply
Those interested in participating should complete this survey by Friday, March 9. Successful applicants will be notified by Friday, March 16.
The survey includes a one-to-two paragraph statement of why you would like to participate, what you expect to get out of the session, and a short project proposal including links to the data sets that will be used during the hackathon. Prospective participants must acknowledge they have met the prerequisites for the course.
The course is being organized by Kjiersten Fagnan of NERSC/JGI; Andrew Wiedlea, IT Division; Hector Garcia Martin, Joint BioEnergy Institute; Mariam Kiran, ESnet; Ben Bowen, Environmental Genomics and Systems Division; and Kris Bouchard, Biological Systems and Engineering Division.
Questions? Contact Kjiersten Fagnan at kmfagnan@lbl.gov.