HugoTheAstro
22 supporters
Using your Computer for Science! – Distr ...

Using your Computer for Science! – Distributed Computing Basics

Sep 04, 2022

Hello again! Back with some citizen science related post!

Recently I was able to return in a more active way to what was my first entry to the world of citizen science, distributed computing! I will not expand too much in the concept itself since that is not the scope of this post and it’s not exactly new and you can check a plethora of places for all the details like Wikipedia (Distributed Computing) but for anybody new to this you can imagine this like joining your computer to others over the internet to basically work like virtual supercomputers used to process data for a variety of science projects that otherwise would need actual massive supercomputers to get their research done in a reasonable time.

The idea of this post is to lay out the basics of what you need to participate in this kind of projects and offer a series of aspects you have to consider before you get excited with the idea of contributing to science in this manner. I will be focusing this post in the BOINC platform and Windows, there are options for Linux and Android devices, but I will admit that I don`t have experience with those. I will do a step by step summary about its installation at the end.

There are a huge variety of projects were you can participate depending of your interest from astrophysics, to virology and mathematics. Depending of the type of study and approach we are going to see bigger or smaller work units with different calculated sizes in regards of how many GFLOPS (Giga - Floating Point Operations per Second) are needed for one to be completed and if they focus in using CPU or GPU computing, that is going to result in different computers and graphic cards taking different amounts of time to complete one task depending of its processing power. I will also skip GPUs for this post since I recommend starting with the CPU side of things until you get used to BOINC.

Now is when things get interesting, because we have to start thinking in the hardware we have at our disposal and if it`s going to be enough for the projects we want to be part of or if its current setup is going to be able to handle the load, especially in the temperature department and RAM department.

As I talked about in this, this and this other post for example, one key aspect for the stability of a computer is to be able to remain relative cool for the workloads it has to handle. In that sense distributed computing can be very tasking and depending of the TDP of your CPU things can get hot very easily. For that reason I would recommend for anybody interesting in trying distributed computing to run at first small test with things like Cinebench (from R15 to R20 or R23 depending of your machine) to get a sense of how well your system handle more intense workloads.

The higher the TDP of your CPU (and/or GPU) the easiest is going to get an overheated system is your current cooling solution is not good enough. For example an older but with a much higher TDP Phenom X4 965 is going to get hotter much more easier than a more modern but lower TDP CPU like the Ryzen 5600G (125w vs 65w).  Don`t worry, I will we discussing more about temperatures while running these projects in a bit.

Of course, you may be thinking… “Well, but if I have a smaller CPU, temperatures should not be a problem!”. Yes and no… and less powerful CPU may run colder, but is going to have a huge impact in the time that the computer is going to need to complete any given task, so in some cases a CPU can in theory run the task for a certain project, but the time that it needs to complete it is going to be more than the time available to do it, or its going to need to be online 24/7 to end before the deadline.

Now I will display a few example of comparisons between several machines running the same project so you can get an idea…

 

As you can see the same project varies widely between these machines. But this particular project called Moo! Wrapper uses small work units, I participate in this project mainly for fun since the idea of finally cracking the code of the RC5-72 is appealing. This project is not going to be so tasking in the RAM side of things and mild with temperatures.

Let`s turn up the dial one notch with MilkyWay@home that deals for example with creating a highly accurate three dimensional model of the Milky Way galaxy using data gathered by the Sloan Digital Sky Survey, currently its work units are sitting in a kind of “middle ground” regarding how much processing power you need to complete one task, so here you can see how much time my old A8 5600K needs for one task. Same goes for World Community Grid that deals with things like cancer research and our little friend from 2019.

Let`s turn up the dial a bit more and see what more demanding projects ask from your system….

Yeah, that sounds like a lot, especially in the case of Climateprediction.net, right? Well, yes and again no at the same time. We can divide the problem here in two parts, the time needed to complete the task and the impact in your system.

I will deal first with the case of Einsten@home, their units don`t look so long but they are quite tasking in terms of processing power and RAM, so running them for example in an older system could result in severe bottlenecks if you want to run their task along other things. For that reason I was unable to participate in this project for a time once their units were more chunkier that what my old CPUs were able to digest in a reasonable way in terms of time.

Climate change can throw at you HUGE work units speaking of GLOPS if we compare with some like Moo! Wrapper.

If you see the in the previous image how much time it needs, that is indeed a lot, but now is a good time to mention that the estimated times are just that, estimates! That means if you want to use a modern CPU like the Ryzen I`m using that estimated can be actually much much lower, I started to run that task on 2022/08/24 and keep running it an intermittent way and now is close to be done if you check the more recent capture below. Same goes for the Enisten@home units. That of course could be the other way around if you CPU is much less capable, for example I remember once tried to run BOINC on a Netbook using a Intel Atom processor… that was not a good idea lol

Time to speak about a very important detail, temperatures! You have to be sure that your system can handle the thermal load, that one of the reason I truly don`t recommend to run distributed computing in something like a laptop. Laptops can be very practical but their biggest weakness are problems related with temperature. A desktop motherboard even have protections in place in case of overheating events, but laptops are different and in some cases this protections can kick up way too late for them.

To use my systems as example, my old A8 5600K is using an old but biffy Lucifer V2 cooler (show in one of the linked post from before), that prevents it for getting too hot even when running several task. My Ryzen 5600G is still using the Aero Cool Cooler from before. With these setups my machines are relative well prepared for thermal events. Acoustic can be also important in a lot of situations, so also pay attention to that while testing at least you don`t have problems with your computer sounding like a jet engine like in this Linus Tech video (minute 1:16):

Of course another way to manage your heat is to manage how much of your CPU BOINC is going to be able to use, is not ideal and in theory you never should run 100% of your CPU to running tasks, that could be equal to running Cinebench all the time… and let`s say what that is not ideal.

How much you can cut of CPU usage is going to depend of how many cores you have… is not the same to use 50% of a 4 core CPU than a 12 cores one, so if you can only spare a few cores for the distributed computing tasks, that is going to reduce the amount of units you can run per day and per project. Additionally that amount is also going to depend of much time your system remains online, in my case keep running them most of the time but that is not always possible for everyone.

After all this…. How I setup BOINC?

It`s relative simple, only follow this simple steps:

 

Regarding the screensaver, I only advise to have active if your system is relative modern. If the projects you are running have some type of graphics, they are going to also show up when the screensaver kicks in. Only a few of them have them, like Einsten@home, but nothing so far can beat the visualization of the work units from Seti@home, how I miss that project! Below the capture of the last work unit I was able to run before the project was shutdown.

Once it’s installed and initialized you are going to see a screen similar to this one, but blank. In my case I prefer to use the “Advance View” in the View Menu to get all the details and advance options. Remember to reduce the amount of CPU than BOINC can use in the options, I recommend to start with a 50% until get familiar with the program (If for some reason you cannot do that without any project running, remember to do this as soon as you join the first one).  You can also configure a threshold regarding of how much non-BOINC CPU usage can trigger BOINC to suspend its works to avoid overwhelming the machine. Usually the default 25% is perfectly fine. Depending of your computer you may be able to run BOINC while doing things like Zoom calls, below also a capture of one my Disk Detective calls while processing work units. As side note, even if my capture of the menus are in Spanish, the location of everything is the same across different languages.

Now it’s time to join a project. You can visit every project webpage to see what they are about, but also pay attention to the small icons that indicates if it’s compatible with the operating system you are using. Is important to mention that every individual project is going to have a separate account and password, so remember to keep track of those! 

Once you start to get works units done, you should go to the account page for that project and see how the units are doing, if everything is going well you should see screens like this ones.

 

Extra points if you guess which of my machines did the work units in the Moo! Wrapper page based on the exectuion time. Note how the credit is the same, not matter how much time was used.

You can get units ending in a error due to errors in processing from time to time, also things like a power failure can ruin a perfectly good unit due to the unexpected interruption.

Regarding keeping track of your contributions, that can be a rabbit hole since there are a multitude of ways of keeping track of your statistics but I will keep it simple for now and point out to the certificate that you can get for each projects, here are a few of mine, including my more precious that is from Seti@home! In some cases like Einstein@home you can even earn a framed certificated if you are lucky enough and your computer is the one discovering a new neutron star!

 I will do a follow up to this post later on, but I hope that this got you interested in another aspect of citizen science and maybe you want to try it out!

See you all in the next one!

Enjoy this post?

Buy HugoTheAstro a coffee

2 comments

More from HugoTheAstro