Версія даної теми для друку

Натисніть сюди для перегляду даної теми у оригінальному форматі

Розподілені обчислення в Україні _ Завершені проекти WCG _ Nutritious Rice For The World [complete]

Автор: Rilian May 21 2008, 09:51

IPB Image

Nutritious Rice for the World

проект стартовал 12 мая 2008

Полезные ссылки:

http://itnomad.wordpress.com/2010/05/19/a-better-rice-for-the-world/

http://distributed.org.ua/forum/index.php?showtopic=890

Изучается строение протеинов риса, чтобы потом, не используя генeтические модификации, а просто за счет скрещивания, выводить новые виды. :flowers:

Mission
The objective of this project is to predict the structure of proteins of major strains of rice. The intent is to help farmers breed better rice strains with higher crop yields, promote greater disease and pest resistance, and utilize a full range of bioavailable nutrients that can benefit people around the world, especially in regions where hunger is a critical concern.

Determining the structure of proteins is an extremely difficult and expensive process. However, it is possible to computationally predict a protein's structure from its corresponding DNA sequence. The Computational Biology Research Group at the University of Washington has developed state of the art software to accomplish this. The difficulty is, there are thousands of distinct proteins found in rice. This presents a computational challenge that a single computer cannot solve within a reasonable timeframe. Therefore, volunteers of World Community Grid are invited to assist in this daunting task. Through collaboration with agricultural researchers and farmers, the hope is to eventually improve global rice yields and quality.

Significance
Hunger and malnutrition are the top risks to health worldwide. Nearly 30 percent of the world's population suffers from some form of malnutrition [1]. Every year, 10 million people die of hunger and hunger-related diseases. In fact, more people die from hunger and malnutrition annually than from AIDS, malaria, and tuberculosis combined [2].

Rice is the main food staple of more than half the world's population. 20 percent of the total food energy intake for every man, woman, and child in the world comes from rice. In Asia alone, more than 2 billion people get up to 70 percent of their daily dietary energy from rice and its by-products [3].

Improving strains of rice to yield larger, more resilient, and nutritionally-optimized harvests will positively impact the lives of billions of people.

Approach
Making better strains of rice has traditionally been accomplished through cross breeding of strains to produce hybrids with the best features. However, this is limited to crossing strains with easily observable traits.

Complex traits (such as high yield, disease resistance, or nutrient content) come from complex biochemical interactions of individual component proteins. Identifying such proteins and understanding their properties and interactions gives farmers the opportunity to affect these traits in a refined manner by choosing more subtle candidates for cross breeding. Predicting the structure of proteins can provide insight into the roles they play in the biochemistry of these traits.

Как работает рассчетное ядро
ВЮ содержит некоторые исходные параметры состояния белков. Каждое новое вычисление берет исходные параметры и генерирует новые значения в пределах некоторой нормы, и каждое такое состояние считается примерно 2 минуты на среднем по мощности компьютере. После каждого рассчета ВЮ сохраняется (чекпоинт) и проверяет, превышен ли интервал в 6 часов. Если да - ВЮ отправляется на сервер. Иначе - считается дальше. Таким образом мощные компьютеры могут посчитать примерно 500 состояний за ВЮ, а слабые компьютеры около 200. Соответственно, мощные компьютеры получают больше очков за то же самое время.

IPB Image

График проекта
IPB Image

Автор: Tamagoch May 23 2008, 16:10

возможно, для очень быстрых процессоров будет выгоднее по очкам, чем медицинские субпроекты WCG, поскольку поощрение идет за количество найденных вариантов рисовых белков (или чего-то там) за 8 часов вычислений...

у меня AMD 2.4 ГГц, все-таки в пересчете на час работы, за рис давали меньше, чем FAAH

к тому же, одно и тоже задание рассылается 19 раз и раздаются очки только после того, как вернется минимум 14 результатов по этому заданию - предполагается, что у всех они будут отличаться, так что это не дубли, а скорее "всеми пальцами в небо"

Автор: Burzum May 23 2008, 22:06

(Tamagoch @ May 23 2008, 05:10 PM)

к тому же, одно и тоже задание рассылается 19 раз и раздаются очки только после того, как вернется минимум 14 результатов по этому заданию - предполагается, что у всех они будут отличаться, так что это не дубли, а скорее "всеми пальцами в небо"

Автор: nikelong May 23 2008, 22:45

Вопрос: зачем считать задание (одно!!) целых 19 раз, если можна для проверки посчитать три раза?

Тупой проект.

Автор: (_KoDAk_) May 23 2008, 23:49

недоверчивые или ждут чудес

Автор: Dmitrio May 26 2008, 12:21

На самом деле проект очень хорошо для старых машин:
1. Очень поднимается статистика кол-ва сданых WU.
2. Даже при очень небольшом времени на кранч машина успеет выполнить WU, потому что считает всегда 8 часов реального времени.

А что касается большого лимита WU, так лучше перестраховаться в начале проекта, чем потом иметь возможно неверные результаты. Также, может быть, там используются вычисления/моделирования, которые не всегда дают один и тот же ответ (например вероятность чего-то учитывается). Тогда только при достаточно большом количестве результатов можно указать на правильный ответ.

Автор: Rilian Sep 29 2008, 21:33

Вот как примерно выглядит графика в проекте

IPB Image

Автор: Death Sep 29 2008, 21:52

каша

Автор: Rilian Sep 29 2008, 22:00

Молекула вертится, типа как в розетте

Автор: Dmitrio Sep 30 2008, 11:53

(Death @ Sep 29 2008, 22:52)

каша

Ага, причем рисовая!

Автор: Rilian Jan 11 2009, 04:17

Посчитал 100 процессорных дней

Автор: Rilian Jan 13 2009, 21:10

С этого дня вю Nutritious Rice for the World считаются по 6 часов (были по 10)

Автор: Rilian Jan 14 2009, 00:05

В шапке добавлен раздел "Как работает рассчетное ядро"

Автор: cosmo_vk Jan 25 2009, 11:00

а золотую медальку за сколько дней дают?
у меня уже 77 процессорных дней на проекте просчитано, а так до сих пор серебрянная, хотя на других проектах уже была бы золотая точно.

Автор: Rilian Jan 25 2009, 12:01

90 дней - стандартный срок

Автор: nikelong Feb 13 2009, 17:24

http://www.boinc-af.org/content/view/947/219/

http://www.rechenkraft.net/wiki/index.php?title=Nutritious_Rice_for_the_World

Автор: Rilian Mar 8 2009, 12:26

Официальной новости еще нет, но на https://secure.worldcommunitygrid.org/forums/wcg/viewthread?thread=24606 что "фаза Б" проекта ("фаза А" закончилась недавно) закончится в Августе 2009! После этого возможно будет пауза для разработки новой версии клиента которая будет считать практические данные

Автор: Anami Apr 27 2009, 14:04

Самый полезный проект

Автор: Death Apr 27 2009, 22:50

гречка самый полезный проект.
ещё nutritious lard может быть...

Автор: nikelong Apr 28 2009, 07:39

Anami,
Аргументы?

Автор: Death Apr 28 2009, 10:50

nikelong, та тю.

раком болеет 1 из 100000 а жрать хотят 1 из 1.

возражения?

Автор: nikelong Apr 28 2009, 11:17

Один из одного хотят жрать рис? Не не не ...

Автор: Rilian Apr 28 2009, 11:41

даешь Nutritious Potatoes For the World!

Автор: Death Apr 28 2009, 12:03

картофель это семейство паслёновых. та ну нах его есть...

Автор: Rilian Oct 31 2009, 01:25

Статус проекта, апдейт

To visualize the work of this project at the protein level, each protein is drawn below as a color dot on a 200x200 grid. The color indicates the status of each protein. Approximately 65,000 proteins are now being processed in this project. Enough work will go into the processing to generate 100,000 three dimensional structures for each protein.

Completed - Currently processing - Not yet started

Sep 15, 2009

Most of our efforts in the fast few months have been spent trying out to tease more domains from the rice protein/proteome to increase the size of the project. These domains have been packaged into work units and are now crunching. So we have raised the number of protein structure predictions from roughly 40,000 initially to about 65,000 when all the larger sequences have been processed. Of these, we have roughly 35,000 completed so we still have about 30,000 to go (so it looks as though we're about halfway done now).

The logic and goal here is that the more comprehensive picture of the individual protein domains in rice we have, the more we can use that to inform us about the structures of other unknown domains in rice as well as other food crops. That is, partial information is much better than zero information. This enables us to obtain a better understanding of the pathways involved at atomic level detail.

Автор: kornq Feb 2 2010, 17:21

Не знаю кто считает этот проект но скоро финиш(где-то до апреля будут WU) , просчитано 92%.

Автор: Rilian Feb 2 2010, 17:34

QUOTE(kornq @ Feb 2 2010, 17:21)

Не знаю кто считает этот проект но скоро финиш(где-то до апреля будут WU) , просчитано 92%.

Откуда информация?

Автор: kornq Feb 2 2010, 19:57

QUOTE(Rilian @ Feb 2 2010, 17:34)

QUOTE(kornq @ Feb 2 2010, 17:21)

Не знаю кто считает этот проект но скоро финиш(где-то до апреля будут WU) , просчитано 92%.

Откуда информация?

Автор: Rilian Feb 2 2010, 21:22

Организовал соревнование до 1 апреля

http://www.worldcommunitygrid.org/team/challenge/viewTeamChallenge.do?challengeId=3254

Автор: Arbalet Mar 3 2010, 13:53

Через 3 недели в проекте Nutritious Rice For The World планируют начать отгрузку новых заданий

This is the official statement saying there is an estimated 30 days left of new work to be sent out. The project is expected to have an additional 3 weeks to clean up the remaining batches after that. The last batch of work units from the researchers are 00627 which has been built and loaded earlier today.[Feb 23, 2010 3:15:12 PM]

http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,28575

Автор: KarLaeda Mar 5 2010, 23:38

(Rilian @ Feb 2 2010, 20:22)

Организовал соревнование до 1 апреля

http://www.worldcommunitygrid.org/team/challenge/viewTeamChallenge.do?challengeId=3254

Не знаю как по поводу первого места, но похоже что второе у немцев отвоюем

Автор: Rilian Mar 13 2010, 03:38

Примерно до 25 марта -- 1 апреля должны закончиться ВЮ из этой фазы проекта

QUOTE(KarLaeda @ Mar 5 2010, 23:38)

Не знаю как по поводу первого места, но похоже что второе у немцев отвоюем

уже отвоевали

Автор: Buck Mar 13 2010, 09:20

А я и не знал что и тут соревнования проходят
Уже 4-й день считаю WCG (весь) всеми своими компами... в Nutritious Rice For The World успел слить 10 резальтов на сумму 10,721 - так что - пусть знают наших
Жаль бразильцев не успеем обставить...
Team Name Current Score
1 BRASIL - BRAZIL@GRID 4,112,954
2 Ukraine 2,713,958

Автор: Rilian Mar 21 2010, 01:42

Последняя пачка ВЮ загружена в GRID. На следующей неделе задания этой фазы закончатся. Всем спасибо и приятного аппетита

Buck, POINTs в WCG равны 7 кредитам BOINC
это такой архаизм на офсайте, который тянется при переводе grid.org на платформу BOINC

Автор: Rilian Apr 1 2010, 14:52

Обновление статуса проекта.

News

Apr 1, 2010

We have begun to analyze the terabytes of results that have been generated through the generous efforts of the volunteers.

Now comes the difficult part of sifting through the data to find the best models. The folding algorithm is noise and there will be many inaccurate models. We need to find the best models from the almost 7 billion models generated. This should take approximately 3-6 months using our fastest methods. After identifying the most accurate models, we then will use the information to figure out what functions these proteins perform in the rice organism. This involves comparing the structure and sequence to known proteins and is also a time consuming process. The plant genomes are not nearly as well studied as the human and mammalian genomes which makes the process all the more difficult.

We are also developing faster and more accurate technologies to examine the data. As we have mentioned in the forums, a gpu-accelerated version of the simulation process has already been developed which is several orders of magnitude faster and more accurate. We have and are extending that technology to the analyses of the model structures. We have also http://protinfo.compbio.washington.edu/mfs/.

We are applying for funding to support these and other efforts to analyze the mountain of data that has been generated during this process. We too are volunteers, and it is our hope that our combined efforts in the NRW project will help develop rice strains that will make a difference in fighting malnutrition and feeding the world’s people. Finally, as the project comes to an end, we want to thank everyone for their generous contributions to this endeavor, especially those that volunteered their computers and time to generate the data. We really appreciated it.

Tentative future plans are to resubmit an application to the IBM to apply the Protinfo algorithm to proteins encoded by 1000 plant transcriptomes generated by the http://www.onekp.com/. This work in progress. Thus the efforts of the WCG volunteers and the results of this study will have a broader impact beyond rice proteomics.

Автор: Rilian Apr 9 2010, 22:45

Проект завершен!

All good things must come to an end and this is the case with the Nutritious Rice for the World project as the final results came in earlier this week.

We are analyzing the results and this will take some time. All of the scientists involved in this project are volunteers as well and we will be analyzing the data to identify proteins and genes that may be useful in breeding better rice strains. We are also applying for funding to further develop some of the technologies such as gpu acceleration of the process and sophisticated techniques that recognise structure and sequence patterns or signatures to identify the function of the protein.

There are some tentative plans to perhaps apply the software to other plant genomes.

More details are avaialble at http://protinfo.compbio.washington.edu/rice/status.html

On behalf of the scientific team involved I'd like to thank the IBM team for their support and enthusiasm for the project. Ensuring that the software could run properly on such a diverse set of hardware is quite an amazing feat and IBM deserves recognition for the time and resources that they have devoted.

However, none of this would have been possible without the help of the thousands of volunteers who donated their computers to this project. We consider it a great honour that you have allowed us onto their home machines to run our simulations these past years.

Thank you so much for your efforts.

Hong Hung

Автор: Gelo Apr 20 2010, 11:13

Ха, а мне только сегодня пришло письмо от WCG о завершении этого проекта

Автор: Death Apr 20 2010, 11:21

запощу письмо

World Community Grid is pleased to announce that as a result of the generous contribution of computing power from our members, the Nutritious Rice for the World project finished on April 6, 2010.

Now that the first step is finished, stay tuned to learn what insights the researchers find as they analyze the data.

For more information please go to News and Updates.

We still need your help with six (6) other ongoing projects! World Community Grid continues to run the following projects: FightAIDS@Home, Help Conquer Cancer, Help Cure Muscular Dystrophy - Phase 2, Help Fight Childhood Cancer, Human Proteome Folding - Phase 2, and Discovering Dengue Drugs - Together, Phase 2. All of these important projects need your computer time.

If you only had Nutritious Rice for the World selected for active projects, then you will start receiving work from the other active projects. To modify your project selection criteria, please go to your My Projects page.

The World Community Grid Team

Автор: Rilian May 17 2010, 18:48

https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,29075

This is a thread to update members about how the results of the Nutritious Rice for the World project are being used.

We are evaluating the model structures sent back evaluate which are most likely to be representative of the real structures. These will be posted and accessible on the website. This in itself is a difficult and time consuming process. We have developed and are developing gpu-based software to allow us to do this faster and use more accurate techniques. A paper on the technical aspects of the project should be out shortly.

After the initial analysis, we will be collaborating with other rice researchers to focus on genes of interest and analyze the models in depth to attempt to ascertain the function of these proteins.

More details as they develop...

--
Hong
Nutritious Rice for the World Scientist

Автор: Rilian May 20 2010, 20:25

http://itnomad.wordpress.com/2010/05/19/a-better-rice-for-the-world/

(добавил в шапку)

Автор: Rilian Jan 31 2011, 20:06

Sorry about not updating for so long.

The lab has received some funding and I am now working full-time on NRW rather than in my spare time. I am very happy to be back in Seattle.

We are about to receive some new CPU/GPU servers to analyze the data and there should be something soon. We are also applying for funds to really upgrade our cluster in anticipation of the 1000 plant proteome (1KP) project.

To make things clear, the GPUs are being used to analyze the data internally. We do have a GPU-aware protein folding client as well. If World Community Grid is ready to go with GPUs we are ready to utilize GPUs in the 1KP project. If not, we can proceed using CPUs with fewer candidate structures per sequence.

Thanks for you patience. I will be updating this thread much more regularly as we get results.

------------

NRFW predicts the structures of rice proteins.

These structures can be compared with proteins with known function. Similarity in structure implies similariy of function. Similarity of sequence to proteins of known function also implies similarity of function to those proteins. Unfortunately with rice, most sequences are dissimilar to anything that has been studied. However combining both structural and sequence similarity information allows us to assign the function of a rice protein/gene more accurately.

Geneticists/breeders use that information to develop better strains of rice as in the projects that you mentioned. So these are not competing projects but projects that can use the information in NRFW.

IRRI is also helping us focus on examining proteins/genes that they believe are most interesting for their work.

------------------

We can run the exact same client on both GPU/CPU but ideally thats something that we might want to change. It makes more sense to leverage the capabilities of GPUs to do some pre-analysis as well as generating structures.

The ATI cards are just as capable in terms of double precision math - it's just that the early acceptance of CUDA means that ATI owners were left out. As you may know, we don't need double precision so we can gain an additional 2-5x speedup and we have tested the folding software on ATI GPUs. OpenCL is supported by both NVIDIA and ATI.

I agree wholehearted with your sentiments and if anything, you may be understating the amount of useful GPUs out there. Even stock and budget GPUs for movie playing are becoming quite powerful.

Hong
Nutritious Rice for the World Scientist

http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,29075_lastpage,yes#312061

Автор: Death Jan 31 2011, 22:04

хехе. велкам 2 гпу эра?

он не сравнил по скорости с коре дуо например.

Автор: Rilian May 6 2011, 15:13

апдейт статуса проекта!

http://protinfo.compbio.washington.edu/rice/status.html

Автор: Rilian Jun 21 2012, 20:27

Mikey

Everything about this project is open - results, data, code.

In the case of NRW, adequate methods to analyze the data from 10 billion noisy protein models simply dd not exist - so honestly, the last thing I am worried about is being scooped. This is one of the few advantages of being at the bleeding edge.

I was much more worried that the new methods being developed actually work and that the results from NRW are as useful as possible so that similar approaches in the future will be funded. It is very hard to convince funding agencies, especially these days, to fund anything that hasnt been thoroughly tested and "guaranteed" to give something useful. This is one the many disadvantages of being at the bleeding edge.

However, this is starting to come together finally as the new methods are benchmarking well on our test sets. We've just had a paper accepted in Bioinformatics on one a new GPU-optimised algorithm that we are using to choose the best structures. I should have the pipeline up and running in the next week and we will start putting up structures on our website soon.

More to come...

Hong

http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,29075

Автор: Bel Jun 21 2012, 20:49

Отлично, будет уже 2 проекта на гпу Ожидаем..

Автор: Rilian Sep 28 2012, 10:48

A paper was published in the journal Bioinformatics, which describes the use of Graphics Processing Units (GPU's) to accelerate computations comparing protein structures.

Paper Title:

“Accelerated protein structure comparison using TM-score-GPU”

Lay Person Abstract:

As part of the analysis of the computed results of the Nutritious Rice for the World project, the researchers need to be able to compare protein structures and efficiently compute a similarity score. A scoring method based on “Template Modeling”, known as TM-score, provides significantly better results than the “root-mean-square-deviation” method, but requires much more computer processing time. To solve this problem the researchers developed a version of the TM-score algorithm which makes use of Graphics Processing Units (GPU's) which are found in newer video hardware, used particularly on gaming computers to enhance the visual experience. Using GPU's they were able to run millions of protein comparisons about 70 times faster. The paper describes how they accomplished this and they offer the software freely to other scientists, who may be able to use it for their research.

Technical Abstract:

Motivation: Accurate comparisons of different protein structures play important roles in structural biology, structure prediction and functional annotation. The root-mean-square-deviation (RMSD) after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops that can obscure local regions of similarity. A more sophisticated measure of structure similarity, Template Modeling ™-score, avoids these problems, and it is one of the measures used by the community-wide experiments of critical assessment of protein structure prediction to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications. This acceleration in speed allows TM-score to be used efficiently in computationally intensive applications such as for clustering of protein models and genomewide comparisons of structure.

Results: TM-score-GPU was applied to six sets of models from Nutritious Rice for the World for a total of 3 million comparisons. TM-score-GPU is 68 times faster on an ATI 5870 GPU, on average, than the original CPU single-threaded implementation on an AMD Phenom II 810 quad-core processor. Availability and implementation: The complete source, including the GPU code and the hybrid RMSD subroutine, can be downloaded and used without restriction at http://software.compbio.washington.edu/misc/downloads/tmscore/ . The implementation is in C++/OpenCL.

Access to Paper:

To view the paper, please http://bioinformatics.oxfordjournals.org/content/28/16/2191.full.pdf.

http://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=209

Автор: Rilian Sep 28 2012, 19:35

Just to let people know.

The way that science works is that publications are months if not years behind the current work. For example, the methodology described in the paper has already been incorporated into a new method for choosing the best protein structure using TM-score as a similarity measure. This has already been applied to the rice protein structures and we have already used the best structures to predict the function of the rice genes.

But to write the results up as papers, we have to do the control calculations and benchmarking. Then after writing the manuscripts, it goes to internal reviews, external reviews, revisions, proofs etc before it gets published.

However, we will be putting up the results on our website soon - right after we do the benchmarks.

Hong