Molecular biology – Portal da Doença de Chagas

Genome

The genome of Trypanosoma cruzi

Wim Degrave

Functional Genomics and Bio-informatics Laboratory, Oswaldo Cruz Institute/Fiocruz

Email: wdegrave@fiocruz.br

Preparations

The biochemistry of Trypanosoma cruzi has been studied over several decades. However, research on the molecular biology of the parasite began simultaneously at different centers, especially in Argentina and in Brazil, coinciding, in part, with the development of molecular biology. Kinetoplastida have several very peculiar cellular processes, such as trans-splicing through the addition of mini-exon (39 nt) to the 5’ part of mRNA; the structure of the kinetoplast, with its maxicircles and minicircles and RNA editing mechanisms – pan-editing in the case of T. cruzi and T. brucei, using guide RNA (gRNA); a digenetic life cycle with different stages, different in physical form, in replicating capacity, and in infective capacity, as well as very different biochemical characteristics; sexual recombination possible yet not very frequent; chromosomes that do not condense; a karyotype with great variations between strains; telomeric structures; control of genic expression primarily post-transcription, with polycistronic transcription etc.

One of the first targets of studies was the kinetoplast DNA (kDNA), of the maxicircle as well as of the minicircle, as well as its use for typing and characterization og strains of T. cruzi (analysis through schizodeme and hybridization). Cloning, sequencing and characterization of T. cruzi genes had the main goal of identifying surface antigens of the parasite. As a result, a wide range of antigens were characterized, kick-starting the development of recombinant antigens for diagnostic purposes. Several other genes and the first repetitive elements were characterized (ORFS and repetitions, for instance), especially of strains CL, Y, DM28c, Tulahuen etc. However, after more than a decade of research, the number of cloned and sequenced genes was a mere 250.

With the advent of the genome projects of different organisms in the early 1990’s in the international scenario, the scientific community around T. cruzi, Leishmaniasis and T. brucei began to discuss the possibility of initiating a genome project for these parasites. A prospective project of express sequence tag (ESTs) had been initiated for Schistosoma mansoni, led by groups in Belo Horizonte (UFMG and CPqRR-Fiocruz), and the community around Plasmodium mobilized in the same manner.

The genome project of T. cruzi was planned during various meetings in 1993 and 1994, especially at the Annual Meeting of Basic Research on Chagas Diasease, in Caxambu (5-10 November, 1993), at the France – Latin America meeting at Ingebi (Buenos Aires, Argentina, 24-26 November, 1993) and at the annual meeting of the Cyted Biotechnology Program, at University of Chile (Santiago, Chile, March 1994), finally crystallizing at the organization of the International Planning Meeting forthe Parasite Genome Network at Fiocruz, Rio de Janeiro, 14-15 April, 1994, by doctors Carlos Morel, Patrick Winckler and Wim Degrave, with the suport of the WHO/TDR and of Fiocruz. During this meeting, about 50 relevant researchers, active in research with T. cruzi, Leishmania or T. brucei, got together and decided to begin their respective projects by means of an effort on three networks, in a coordinated fashion: the Tritryp.

During the meeting, the Chagas group decided to begin the project by selecting as target strain for their studies a clone of T. cruzi, CL strain (clone F11F5), from here on named CL-Brener, after Dr. Zigman Brener, who isolated the strain and the clone. This choice was based on the following arguments: it was a stable and available clone, with good biochemical and parasitological characterization; isolated from a human case or from a (peri)domestic cycle; easy to grow in vitro and in animal models; sensitive to the drugs used to treat Chagas disease and considered representative for the universe of circulating T. cruzi strains. CL-Brenner also demonstrates a clear acute phase in mice and in accidentally infected humans, and triggers a chronic phase in mice, with tropism for skeletal and cardiac muscle. It also differentiates for metacyclic in vitro with reasonable efficiency.

The main goals of the project were:

To drastically increase knowledge on the (molecular) biology of these parasites, which have a lot of unique characteristics.
To rapidly identify a large number of new genes with key functions in the cell, and which could be the targets for new drugs.
To identify new antigens that could be useful for diagnostiic purposes or for the development of vaccines.
To analyze the evolutionary relations between T. cruzi and other kinetoplastids, and to verify the variability between strains, isolates, and lineages of the parasite.
To analyze the relation between the structure and the function of proteins and molecules in the cell, and to clarify aspects of the interaction between the parasite and its hosts (human, reservoirs, and vectors).
To build North-South and South-South expertise and collaborations for genomic research in the fields of mapping, large-scale sequencing, bio-informatics, research on protein structure-function relation etc.
To contribute with the general knowledge of the structure of genomes, compared biology and evolution of these parasites.

In the early years of the project, several initiatives of coordination and workshops were organized, including a course called International Training Course on Parasite Genome Projects: Strategies and Methods, and the Symposium on Genome Projects, at Ingebi, Buenos Aires, Argentina, November 13-24, 1995, coordination meetings financed by the WHO/TDR and others, in Teresópolis (Brazil, April 10-12, 1996), in Rio de Janeiro (Brazil, June 9-11, 1997), and later as part of the TriTryp, in Hinxton, England.

The first steps

Back in 1994 and 1995, the biological characteristics of clone CL-Brener were studied in further detail, as well as molecular typing using zemodeme, schizodeme, RAPD and molecular markers, and the genetic stability of more than 100 generations were verified. A reference laboratory (Dr. Bianca Zingales, Chemistry Department, USP, Sao Paulo) was in charge of keeping and distributing the CL-Brener clone.

Early in 1996 molecular markers were discovered (18S and later the mini-exon) that could discriminate the isolates and strains of T. cruzi in two main lineages, later renamed T. cruzi I (wild cycle) and T. cruzi II (domestic-mammal-human cycle). Later, even more subdivisions were proposed. In addition, models were proposed to correlate clinical manifestations with infecting populations and clones. CL-Brener was initially identified as T. cruzi II, and clone Dm28c was selected as a representative of T. cruzi I.

The detailed analysis of the CL Brenner T. cruzi karyotype and of other strains was carried out, as well as the analysis of linkage groups, and the description of chromosomal markers. Depending on the work, a numbering system was used with 20 to 41 bands ranging from 0.45 Mb to 3.5 Mb, numbered from I to XX (in low resolution) or from 1 to 42 (in medium resolution); the total number of bands seemed to be around 72 (or 36 pairs of chromosomes). The size of the diploid genome of CL-Brener was estimated, at the time, to be 87 Mb. Linkage groups were identified and important differences were observed in terms of size between apparently homologous chromosomes. The karyotypes of different strains of T. cruzi are very different, and some groups have argued that CL-Brener, or T. cruzi II, would have a genome size substantially larger than strains of T. cruzi I, with additional fractions of repetitive sequences making up to 50% of the genome.

Different types of libraries were built to initiate the mapping and sequencing: libraries of CL-Brener YAC, BAC and cosmids, as well as of cDNA (epimastigotes, normalized and non-normalized library), and a genomic library in lambda zap.

Sequencing itself began at different centers with analysis of ESTs, cosmids of chromosome 3, and a random genomic analysis (genome survey sequence; GSS).

The data and the progress of the genome project were presented on web pages, and a detailed version was made in a winace database, called TcruziDB. The main collaborators of this phase were doctors John Swindle, Bjorn Andersson, John Kelly, Ulf Pettersson, Lena Aslund, Edson Rondinelli, Turan Urmenyi, Jacqueline Bua, Andres Ruiz, Andrea Macedo, Bianca Zingales, Carlos Frasch, Denis le Paslier, Jose Luis Ramirez, Rafael Aldao, Antonio Gonzalez, Adeílton Brandão, José Maria Requena, Mariano Levin, Jos Franco da Silveira, Wim Degrave and many others, from many research centers in many countries. Support provided by the WHO/TDR (the manager of the parasite genome networks was initially Dr. Feliz Kuzoe, followed by Boris Dobrokhotov, and then Ayoade Oduola), especially through the financing of an annual meeting, was essential for the groups to gather round and focus on the goal and to ensure the cooperation and inclusion of groups of endemic countries.

In 1999, the consortium of laboratories sent support letters to apply to a financing by the NIAID. This support was granted/confirmed in 2000 for a consortium (TSK-TSC) of the University of Uppsala, Karolinska Institute (Bjorn Andersson), TIGR (Najib El-Sayed) and SBRI (Ken Stuart, Peter Mayler) for the complete genomic sequencing, in two phases.

There was much investment in physical mapping, especially by the group of Dr. José Franco da Silveira and collaborators, using the YACs library. The careful mapping of the telomeric structure of T. cruzi and of the subtelomeric region was also published.

The more detailed characterization of CL-Brener, with the comparative mapping of linkage groups (linkage mapping and the distribution of repetitive sequences in the genome, was done by several groups.

The first reports on a likely heterozygosity of the CL-Brener clone were published in 2001 and were later confirmed by several groups. This created a big problem for the sequencing troups, as Bjorn Andersson had reported that genes in “homologous” chromosomes had up to 2% nucleotide divergence even in coding sequences. Due to the high contents of repetitive sequences, this would create extreme challenges for the genome assembly phase.

The full sequencing

The genomic sequence of the CL-Brener T. cruzi was officially published together with the full genomic sequences of L. majorand T. brucei in the Science magazine, in 2005. The genome was only partially assembled due to the many difficulties with repetitive sequences and with the heterozygosity of the clone. Therefore, 22,570 proteins were predicted, of which 12,570 were allele pairs. In addition to retrotransposons and other repetitive elements, there were many genes belonging to big families of surface proteins, such as trans-sialidases, mucins, gp63 protease, and mucin-associated surface proteins (MASPs), which represent about 18% of all coding sequences. Different biochemical and molecular aspects that are specific of Tritryps were observed.

The genomic sequences were obtained through random sequencing (shot-gun) up to a density of 16x, with 53.4% of GC. A 2.5x coverage of T. cruzi Esmeraldo was also obtained, to help with the assembly. The assembly amounts to 67 Mb in 8740 contigs, representing a good part of the non-repetitive diploid genome. There is an estimated 5.4% divergence between the two haplotypes in CL-Brener in non-coding regions, and 2.2% in coding regions. It is estimated that T. cruzi contains about 12,000 genes (haploid genome), between coding for proteins and genes for rRNA, slRNA and snoRNA. For the predicted protein sequences, about 50.8% received a potential indication as for a possible function, based on similarity analysis. 6,158 COGs (clusters of orthologous groups) were mapped, common between the tritryps, and an additional 458 shared COGs between T. cruzi and T. brucei were also mapped, as well as 482 shared between T. cruzi and L. major, and 3,736 specific for T. cruzi.

More than 50% of the CL-Brener genome is repetitive (between repeated coding genes, micro- and mini-satellites, LTR and non-LTR repetitions etc), with at least 1,052 clusters of paralogous, of which at least 46 have more than 20 members.

Genes that are potentially coding for DNA repair machinery, recombination, replication and meiosis were found, with many similarities and also divergences when compared with the set in yeast.

Many adenylate cyclases, phosphatase proteins and especially kinases were identified, as well as muitiple enzymes involved in the metabolism of phosphoinositides, with considerable differences when compared with the corresponding families and structures of domain in humans.

Many of the surface proteins are intensely glycosylated. T. cruzi obtains sialic acid from the host through the trans-sialidase and transfers it to mucins. The structures, functions and genic variability of the superfamilies involved are still very poorly understood.

The comparison of the genomic structures and predicted proteins with those of L. major and T. brucei indicate more similarity between T. cruzi and T. brucei, great conservation between the respective proteomes (except for surface antigens), and a very extensive synteny, in spite of the divergence that took place 200-500 million years ago.

Functional genomics

Beginning with the partially assembled genomic sequence and with the set of predicted proteins made available, many research lines in the field of functional genomics will be necessary to deepen our knowledge and to verify and confirm hypothesis. Some tools for such studies are:

A set of transient transformation vectors (ex. pTex) and stable integrative transformation (pRibotex, pTrex and others) was developed.
A chromosomal fragmentation system using artificial telomeres, as well as vectors causing the replication as artificial chromosomes of T. cruzi.
Genic expression analysis through micro arrangements.
Analysis of the proteome of T. cruzi.

There has recently been a characterization of the centromeric regions, containing GC-rich regions and transcriptional strand switch region rich with fragments of degenerated retrotransposons.

There are many ongoing projects studying comparative genomics, metabolic reconstruction, biochemical characterization searching for new targets for drugs, genic expression during the different life stages of the parasite etc.

Populations

Populational structure of Trypanosoma cruzi

Andrea Macedo

Department of Biochemistry and Immunology, Biological Sciences Institute, Federal University of Minas Gerais

Email: andrea@icb.ufmg.br

Trypanosoma cruzi, the etiological agent of Chagas disease, shows a high degree of intraspecies polymorphism, ranging from morphological aspects (slim, wide and intermediary forms), already observed by Carlos Chagas early in the 20th century) to refined molecular markers described more recently (revised by Macedo and collaborators in 2004).

Among the most commonly studied molecular markers, we can highlight the electrophoretic analysis of isoenzymes, also called zymodemes or MLEE – Multilocus Enzyme Electrophoresis; the schizodemes obtained from kDNA restriction profiles; the polymorphism detected by DNA fingerprint techniques (Macedo et al, 1992), LSSP-PCR – Low Stringency Single Primer -PCR, RAPD – Random Amplified Polymorphic DNA karyotyping; or, still, the variability observed in the profiles of micro satellites, rDNA genes, mini-exons; mitochondria, among others.

The genetic heterogenicity of the T. cruzi taxon has profound biological significance. The set of heterogeneous characteristics of populations of T. cruzi, together with the feeding habits of vector species and the set of vertebrate hosts present in a certain environment defines two transmission cycles of T. cruzi: the wild cycle, involving mainly marsupials and small wild rodents, and the domestic cycle, which affects humans and other mammals of the peridomiciliary environment.

There is plenty of evidence showing that many populations of T. cruzi are polyclonal. This is actually a theoretical expectation, considering the natural history of T. cruzi. Patients in endemic areas probably get infected multiple times due to contact with different triatomines, which, on the other hand, feed on different individuals. This “promiscuity” leads to the formation of multiclonal populations in definitive hosts and vectors, which, when isolated and characterized in the laboratory, are designated as “isolates” and/or “strains”.

Main strains of T. cruzi

T. cruzi is a very polymorphic species, and its population structure is far from being fully understood. For many years, the prevalent idea was that populations of T. cruzi could not be grouped into discreet clusters that represented natural taxons of the species. On the contrary, a multiclonal structure was proposed for T. cruzi populations, in which different clones would evolve basically through clonal reproduction, after a much older common ancestor. However, the identification of a significant similarity between some populations of T. cruzi, revealed by different molecular markers, generated a consensus about the existence of at least two main phylogenetic lineages within the T. cruzi species. The inicial dichotomy proposed for T. cruzi was later reinforced with a great variety of other epidemiological, biochemical, biological and molecular markers, but the designation of the main groups in the different works was very confusing. In 1999, in an effort to standardize the nomenclature adopted by the different research groups, the two main lineages were renamed T. cruzi I and T. cruzi II. It was then established that, from that date onwards, strains equivalent to Zymodeme 1, to Type III, to lineage 2 of rDNA, to Group I, to Ribodemes II/III or similar would be designated as T. cruzi I. The category T. cruzi II would include strains equivalent to Zymodeme 2, Zymodeme A to Type II, to lineage 1 to rDNA, to Group Ii, to Rimodeme I or similar. The designation of Zymodeme 3 strains and of those apparently hybrid, such as the ones classified as Chilean Zymodeme 2B, Zymodeme B, Type I, Group 1/2 of rDNA, or clone 39 would be decided after additional studies.

Subsequently, using isoenzyme and RAPD analysis, Brisse and collaborators (2000) proposed the subdivision of the T. cruzitaxon in six lineages or DTUs (Discret Taxonomic Units) I, IIa, IIb, IIc, IId, IIe; DTU I corresponds to lineage T. cruzi I and DTU IIb corresponds to lineage T. cruzi II. Sublineages IIa, IIc-e and including hybrid strains and those belonging to zymodeme 3. Although it has not been officially recommended, this nomenclature has been very commonly cited by the scientific community.

More recently, Freitas et al., 2006, using a set of 7 molecular markers, including the polymorphism associated to 5 loci of micro satellites and to two genes: one nuclear gene (rDNA 24Sa) and a mitochondrial gene (COII – sub unit II of the cytochrome oxidase), proposed the existence of a third main lineage in T. cruzi, designated T. cruzi III.

The equivalence of the nomenclatures of the main groups or lineages of T. cruzi, determined through different biochemical and molecular methodologies, is presented in Table 2.

**CL Brener and hybrid T. cruzi strains**

Strains with hybrid characteristics have been drawing a lot of attention from the scientific community, especially after the elegant demonstration of the recombination capacity of T. cruzi cells in laboratory. This work, through the artificial selection of drug-resistant recombining parasites, demonstrated the fusion of parental nuclear genotypes with loss of alleles and evidence of homologous recombination evidence. Curiously, in all cases, absence of mitochondrial DNA fusion was observed, suggesting a recombination model in which one of the parentals is the “donor”, which contributes only with nuclear material, while the other is the “receptor”, which maintains its kinetoplast DNA.

It should be highlghted that, although it is capable of recombination, as far as we know, T. cruzi reproduces predominantly through binary fission. As a consequence, its diploid nuclear genotype is transmitted en bloc to its offspring, which probably results in the high levels of linkage unbalances observed and in the typically clonal structure of the parasite population.

Although the fusion of T. cruzi cells is not apparently a common phenomenon, studies based on isoenzyme pattern analysis, RFLP of constitutively expressed genes, RAPD, karyotype analysis, and analysis of nuclear and mitochondrial sequences have confirmed the existence of various of these hybrid populations. Interestingly, Freitas and collaborators (2006) demonstrated that all natural hybrid strains analyzed so far are the result of distinct hybridization events involving populations of T. cruzi II and T. cruzi III lineages; the latter is always the receptor population that maintains mitochondrial DNA.

The CL Brener clone is actually a good example of the complexity and of the difficulty in accurately determining the populational structure and the phylogeny of the T. cruzi species. Initially, based on rDNA 24 Sa markers, mini-exons and RAPD, this clone was included among the strains belonging to lineage T. cruzi II. However, with the expansion of polymorphism studies for other regions of the gene, whether expressed or not, it became clear that CL Brener is a hybrid clone. Initially this clone was interpreted as being the result of a hybridization event between lineages T. cruzi I and T. cruzi II, but it is currently acknowledged as a hybrid originated from the fusion of strains of lineages T. cruzi II and T. cruzi III, also known as DTU IIb and IIc, respectively.

**Clinical implications of T. cruzi genetic variability**

One of the most intriguing current issues regarding Chagas disease is related to the possible role played by the genetic diversity of T. cruzi populations in the determination of the different clinical forms of this parasitic disease. Although almost one hundred years have gone by since its discovery by Carlos Chagas, little is known on the determining factors of the clinical manifestations of the disease, and even the existence of a relevant role of the parasite in this process has been questioned. In fact, human beings can be considered a recent accident in the evolutionary history of T. cruzi. It is estimated that T. cruzi emerged as a species 150 million years ago, but the first contact with man probably took place much more recently, approximately 15,000 years ago, when humans populated the Americas. It is therefore natural to suppose that not every population of T. cruzi is able to infect humans and cause Chagas disease. Until recently, the prevalent understanding was that the T. cruzi II lineage would be more associated to the domestic transmission cycle of infection by T. cruzi and, therefore, to the pathology of the disease in humans. Although this idea is still prevalent for the classic endemic regions of the Southern Cone (Brazil and Argentina), it is increasingly evident that for countries in the north of South America (northern Brazil, Bolivia, Colombia, and Venezuela), T. cruzi II is perhaps the main lineage involved in human Chagas disease.The mechanism through which the different clinical forms of the disease establish themselves remains obscure. There are certainly patient-related factors involved, but it is becoming more evident that there exists a fundamental role associated to genetic aspects of the parasite. On the other hand, in spite of the effort of various researchers in this sense, it has not yet been possible to correlate genetic diversity of the parasites to the clinical characteristics of the disease. One possible explanation for this apparent absence of correlation is that various T. cruzi strains consist of different subpopulations or clones that may have tropism for different tissues (reviewed by Macedo and collaborators, 2004). Therefore, an important factor in the determination of the clinical course of the disease seems to be the constellation of infecting clones and their specific tropisms. As most techniques used to genotype T. cruzi require isolating the parasite from the patient’s blood and its maintenance in culture, there is plenty of opportunity for clonal selection to occur, so that the parasite populations available for the analysis may differ from those that actually cause tissue lesion and, probably, the clinical manifestations. This scenario, which constitutes the core of the clonal model of Chagas disease, implies the necessity of analyzing the genetic diversity of the parasites directly in the infected tissues. There is currently available a series of markers with sufficient sensitivity to be used in genotyping studies of T. cruzi directly in patient tissues, which certainly opens up new horizons for the molecular epidemiology of Chagas disease. However, it should obviously be noted the circumstances under which these analysis (directly in patient tissues) are ethically recommended and adequate.

Genomic architecture

**Genomic architecture and gene composition of the Trypanosoma cruzi parasite**

Santuza M.R. Teixeira

Department of Biochemistry and Immunology, Biological Sciences Institute, Federal University of Minas Gerais

Email:

Daniella C. Bartholomeu

Department of Parasitology, Biological Sciences Institute, Federal University of Minas Gerais

Email:

The flagellated protozoan Trypanosoma cruzi is the etiologic agent of Chagas disease or American trypanosomiasis, which affects from 16 to 18 million individuals in Latin America. The clinical symptoms of Chagas disease are highly variable and can range from asymptomatic, the most common form, to severe involvement of the heart and/or the digestive tract. The factors that determine the clinical variability of Chagas disease are not well known, but the consensus is that both the genetic constitution of the parasite and that of the host must play important roles in the determination of the course of the infection.

Epidemiological, biochemical and molecular studies have shown that the T. cruzi taxon is extremely heterogeneous, genotypically as well as phenotypically. Although its reproduction is mainly clonal, evidence of rare events involving exchange of genetic material has already been detected in countless works that attempt to study the populational structure of the parasite. Based on various molecular markers, taxon T. cruzi was divided in two genetically distinct lineages, which were denominated T. cruzi I and T. cruzi II. It was later proposed that lineage II be subdivided in five sub-lineages (IIa – IIe). On the other hand, lineage I has no subdivisions. More recently there have been reports of evidence to the existence of a third lineage, called T. cruzi III, which includes strains that had not been classified according to the T. cruzi I and T. cruzi II dichotomy. Although clonal reproduction is by far the predominant reproductive strategy in T. cruzi, the molecular characterization of hybrid strains has shown that exchanges of genetic material definitely occurred in the past. In addition, hybrids derivated from strains of the T. cruzi I group were generated experimentally, confirming the capacity of T. cruzi to carry out genetic exchanges.

The great genetic divergence between the lineages of T. cruzi I and II is reflected in many epidemiological and pathological aspects of Chagas disease. In countries of the Southern Cone where Chagas disease is more severe, T. cruzi I is associated to the wild cycle, infecting mainly arboreal mammals, while T. cruzi II is predominant in domestic cycles, infecting humans and other terrestrial mammals. Epidemiological evidence and genotyping of parasites directly from infected human tissues have demonstrated that lineage T. cruzi II is the predominant causal agent of Chagas disease in this area. On the other hand, T. cruzi I is predominant in the Amazon basin and in endemic Chagas areas in Venezuela. For reviews of the populational structure of T. cruzi and their relation with Chagas disease, please see Buscaglia and Di Noia, and Macedo and cols, Zingales and cols.

As the Trypanosomatidae family diverged very early from the evolutionary lineage that originated superior eukaryotes, its members have aspects related to organization and expression of genes that are very peculiar. Peculiarities in the chromosomic structure and organization make it difficult ot visualize distinct metaphasic chromosomes, and it has not yet been defined how many chromosomes the species has, exactly, as they do not condense during cell division. However, PFG analysis (pulse field electrophoresis) of various strains followed by hybridization with telomeric sequences and probes of preserved genes has led to the conclusion that this is a diploid genome with a number estimated between 20 and 40 homologous chromosomes with very different sizes. In addition to the nuclear genome, all members of order Kinetoplastida have an extranuclear genome, called kinetoplast DNA, or kDNA, which can represent 20-25% of the overall DNA contents. The kDNA of T. cruzi consists of a network containing approximately 40-50 maxicircles from 22 to 28 kb that correspond to the mitochondrial DNA of eukaryotes, and 5,000-10,000 minicircles of approximately 1.5 kb with highly variable sequences and whose function is related to the editing process of mRNAs, which codify mitochondrial proteins.Another peculiar characteristic of the genome of trypanosomatides is the presence of long units of polycistronic transcription, which includes various genes organized in tandem; in general these are not functionally related. As they sequenced the first chromosome of a trypanosomatide, in 1999, Myler and collaborators showed that chromosome 1 of Leishmania major consists of two polycistronic units containing 29 genes in tandem, codified in one of the strand, and 50 genes codified in the opposite strand, with a region of inversion between the two directional transcription clusters. This structure, containing long transcription clusters, has proven to be very similar in other trypanosomatides, as illustrated in (Figure 1).

**Figure 1 –** Chromosomes 7 and 10 of *Trypanosoma brucei* aligned with the chromosomes of L. major and corresponding regions in the genome of *T. cruzi.*

Various groups have demonstrated that, similarly to what happens in prokaryotes, gene transcription in tripanosomides is polycistronic. Surprisingly, in T. brucei and in exogenous genes introduced to T. cruzi through electroporation, mRNAs can be generated by the action of RNA-polymerase I. However, just like in eukaryotes, pre-mRNAs are processed in the nucleus, to form mature, monocistronic cytoplasmatic mRNAs. For mature mRNAs to be produced, two post-transcriptional events are necessary: addition of poly-A tail to extremity 3’ of mRNAs and addition of a preserved sequence of RNA of 39 nucleotides to extremity 5’, called spliced leader (SL) or mini-exon, by means of the trans-splicing mechanism (Figure 2). The SL sequence, found only in trypanosomatides and in nematoids, is identical in all mRNAs of the same organism, but varies in between species. Another characteristic of genes of trypanosomatides that is also typical of prokaryotes is the scarcity of introns, which have been identified in only four genes. For a review of the gene expression mechanism in trypanosomatides, see Teixeira and DaRocha.

**Figure 2:** *T. cruzi* trans-splicing model

In 2005, the full sequence of the genome of the CL Brener clone of T. cruzi was obtained through an international consortium, together with data of the genome of two other trypanosomatides that cause important tropical diseases: T. brucei and L. major.CL Brener is a hybrid strain classified by some groups as belonging to sub-lineage IIe, and was selected as reference strain for the genome project due to countless works that had already been published with this strain, including EST (expressed sequence tag) sequencing analysis of various forms of the parasite’s life cycle. The genome of the CL Brener strain, estimated between 106.4 and 110.7 Mb, was sequenced using the WGS strategy (Whole Genome Shotgun), with a coverage of 14x. As this is a hybrid genome, this high coverage had the goal of allowing for suitable representativity of the two haplotypes corresponding to each of the parental lineages present in the genome of the CL Brener clone. In addition, it was necessary to modify the algorithms used in the assembly of the genome, in order to avoid mixing the two haplotypes in the contigs generated via WGS. To distinguish the haplotypes, post-assembly comparisons were made involving sequences of CL Brener contigs and WGS sequences obtained from libraries built based on the DNA of a representative of parental group IIb (which corresponds to the Esmeraldo strain). Pairs of alleles have thus been identified for half the genes of CL Brener. The notation of the full sequence of the genome, which has a G+C content of 51%, indicated the existence of 22,570 genes codifying proteins, of which 12,570 represent pairs of alleles. Of this total of approximately 12,000 genes, it was possible to determine the function of 50.8% based on literature and on results of similarity with proteins already characterized or in the presence of characteristic functional domains. Most of the differences identified in the two haplotypes present in the genome of clone CL Brener correspond to insertions and deletions in intergenic and subtelomeric regions and regions with amplifications of repetitive sequences; the average divergence between the two haplogypes is 5.4%, a value that drops to 2.2% in codifying regions.

The high percentage of repetitive sequences in tandem in the genome is another marked characteristic that made contig assembly very difficult. Due to this characteristic, the full assembly of the T. cruzi genome has not been fully concluded. Most of these sequences not incorporated to the contigs are non-coding repetitive elements or members of multigenic families organized in tandem. The data on the assembled and annotated portion of T. cruzi genome, corresponding to 838 scaffolds (containing 4008 contigs for a total of 60.4 Mb), are available on GENEDB and TritrypDB.

As the genomes of T. brucei and of L. major were sequenced simultaneously, it was possible to make interesting comparative genomic analysis between the three organisms, demonstrating the existence of high levels of synteny (preservation of gene order) between the three genomes (figure 1). These studies also made it possible to investigate the evolutionary history of the chromosomes of these organisms. It was postulated that the current genomic architecture of T. cruzi and L. major is probably similar to a fragmented genomic organization of the ancestral genome, which would have suffered various chromosomic fusion events during the generation of the lineage that gave origin to T. brucei. These analysis also revealed the existence of various genes exclusive to the tripanosomatide family, some of which are considered excellent candidates for ongoing studies in the current phase of the “post-genomic” research, which will be able to identify new targets for the development of drugs and control methods for the diseases caused by these organisms.

Almost all data published on the structure and organization of T. cruzi genome refer to sequences belonging to the CL Brener clone. There are very few works, even among those that focus on the study of individual genes, that contain data derived from genome sequences from strains belonging to lineage T. cruzi I or III. As a consequence, little is known on the important differences that must exist in the genome of strains belonging to distinct lineages of T. cruzi and which would be responsible for the visible differences in the biological behavior and in epidemiological data regarding Chagas disease. Karyotype analysis of some of these strains indicate an average genome size of strains belonging to the T. cruzi I lineage significantly smaller than that of strains of lineage T. cruzi II. This difference is a consequence of, among others, a reduction of the amount of repetitive sequences, such as the 195 bp satellite, for instance.

Other differences already detected among strains belonging to lineages T. cruzi I and II which must reflect in differences in the structure and organization of the genomes of the strains are related to DNA repair machinery in the parasite. In 2003, Augusto-Pinto and colleagues, studiying proteins involved in the mismatch repair pathway (MMR), observed that strains Colombiana (T. cruzi I), JG (T. cruzi II) and CL-Brener (hybrid strain) presented different levels of micro satellite instability as response to oxidative stress. In addition, strains of T. cruzi II have proven to be more resistant to treatment with cisplatin, a phenotype that indicates a repair machinery that is not very efficient. These results are in accordance with recent data from our own group as well, which indicate a higher level of polymorphisms in families of multicopy genes present in the genome of strains of the T. cruzi II group, when compared with sequences of group I. Based on these studies, our group proposed that a less efficient repair machinery in strains of group T. cruzi II could lead to the generation of higher genomic variability in the strains of this group. Studies on the genome of strains belonging to lineage T. cruzi I are certainly essential so we can have the full picture of the genome of species T. cruzi.

In silico

Genome and in silico approaches

Alberto Martín Rivera Dávila

Laboratory of Computational Biology and Systems, IOC, Fiocruz

Email: davila@fiocruz.br

Under construction

Genic regulation

Regulation of gene expression in Trypanosoma cruzi

Samuel Goldenberg

Paraná Institute of Molecular Biology, Carlos Chagas Institute/Fiocruz

Email: sgoldenb@tecpar.br; sgoldenb@fiocruz.br

Like other trypanosomatides, Trypanosoma cruzi has some very peculiar biological characteristics when it comes to the organization and function of its genome. The genetic constitution of T. cruzi shows the existance of great polymorphism, and as a consequence there is a significant variation in the amount of nuclear DNA and in the number of chromosomes between different parasite isolates. Unlike most eukaryotic organisms, the genes of T. cruzi and of other trypanosomatides are not, in general, interrupted by insertion sequences (introns).

mRNAs (messenger RNAs) of T. cruzi have, in their 5’ extremity, the cap structure, and in their 3’ extremity, a poly-A sequence. The cap structure of trypanosomatides is different from the cap structure of the messenger RNA of other eukaryotes, as it is highly modified and originally from another transcript called mini-exon or spliced leader (SL) RNA. Therefore, all mRNAs of T. cruzi have the same sequence in their 5’ extremity.

This common sequence (SL-RNA) or mini-exon is added after the transcription of mRNAs through a peculiar mechanism called trans-splicing (2006). The SL-RNA sequence is codified in a different place from the parasite’s genome and added to each mRNA precursor that is transcribed. This mRNA processing mechanism is very peculiar, the result of the peculiarity of the mRNAs transcription process itself, as, unlike in other eukaryotes, the mRNAs of trypanosomatides are transcribed as polycistronic units (i.e. different messages in the same transcription unit). However, unlike the polycistronic transcription described in prokaryotes, the different mRNAs co-transcribed in trypanosomatides in general are not related, in terms of functionality or time expression of the translation products (codified proteins). In addition, no promoters of RNA polymerase II were observed in the regions uphill to most of the genes that codify proteins in trypanosomatides. In the case of T. cruzi, just like in other trypanosomatides under study, promoters were found for RNApolymerase I and for the mini-exon, which is transcribed by an RNA-polymerase II. However, it is important to mention that the genes that codify the three RNA-polymerases of eukaryotes were described in trypanosomatides as well.

Therefore, regulation of gene expression in trypanosomatides occurs mainly at a post-transcriptional level. Indeed, we imagine that genes are transcribed and processed continuously (relaxed transcription), and their expression is therefore regulated both by selective transport into the cytoplasm and by mRNA stability, or by the selection of the mRNA that will be translated, through a differential polysomal mobilization mechanism. In the case of stability, a phenomenon widely studied in trypanosomatides, most of the studies focus on showing the role of the non-coding 3’ region (3’-UTR). The UTR sequence itself obviously plays no role in the regulation; the proteins associated to them do. These proteins could then modulate gene expression, participating in the selection of the messenger RNA sequences to be translated, as there is plenty of evidence pointing to a selective mobilization mechanism of mRNA sequences for polysomes.There is still much to be researched and clarified regarding gene expression regulation in trypanosomatides. With the determination of the genomic sequence of the three trypanosomatides with relevance for human health (T. cruzi, T. brucei and Leishmania major) using genomic and post-genomic analysis tools and as epigenetic studies advance, new mechanisms are bound to be comprehended.

K-DNA

**The kinetoplast DNA (kDNA) of Trypanosoma cruzi and its applications**

Constança Britto

Laboratory of Molecular Biology and Endemic Diseases, Oswaldo Cruz Institute/Fiocruz

Email: cbritto@ioc.fiocruz.br

kDNA genetic variability applied to molecular epidemiology studies for Chagas disease

The great heterogenicity of the sequence of thousands of minicircle molecules that make up the net structure of kinetoplast DNA has made it possible to use kDNA as an excellent model for molecular typing studies of Trypanosoma cruzi. The first reports of DNA polymorphism in T. cruzi date back from 1980 and were based on the analysis of sizes of restriction fragments (RFLPs, Restriction Fragment Length Polymorphism) of kNA minicircle molecules. The restriction patterns generated allowed for the characterization of different parasite clones and strains, which were grouped in subpopulations according to the similarity of the profiles of DNA digestion products by minicircles, via restriction endonucleases. Therefore, populations of parasites presenting identical or similar restriction patterns were denominated schizodemes.

The analysis of kDNA digestion profiles of T. cruzi revealed extreme intraspecies genetic diversity, demonstrating, for the first time, that a single strain of the parasite could contain two or more distinct clonal genotypes, probably representing clones that exist naturally and that were genetically isolated from each other over time. The existence of these multi-clonal strains was later confirmed by various groups, using different approaches. The classification of strains of T. cruzi in schizodemes was an important epidemiological tool to monitor specific strains in nature and to understand the possible role played by genetic diversity of T. cruzi in the disease syndrome.

**Preserved and repeated sequences of kDNA minicircles as target of detection of Trypanosoma cruzi: use in the molecular diagnosis of Chagas disease**

In the 1980’s, with the proposition of molecular typing technology applied to T. cruzi through kDNA minicircles restrictin profile, it became interesting to clarify the primary structure of these molecules, target of the schizodeme analysis. Therefore, by sequencing the DNA of miccrocircles of diffferent strains and isolates of T. cruzi, it was demonstrated that each molecule of approximately 1,400 base pairs (bp) was organized in four small regions (120 bp), arranged perpendicularly to each other (at 90-degree angles), which presented a high level of intraspecies sequence conservation. These preserved regions alternated with smaller regions (around 330 pb) that showed extreme sequence variability between the thousands of minicircles that make up the kDNA network (Figure 1).

The discovery that kDNA minicircles in T. cruzi contained repeated (preserved) and abundant segments, called “mini-repeats”, made these molecules ideal targets for the development of this parasite’s typing and detection molecular proceses. We can consider that the identification of “mini-repeats” in minicircles of T. cruzi was a breakthrough, due to the perspectives and opportunities it opened up for new approaches in laboratory, diagnostic, clinical and epidemiological studies in Chagas disease.

With the advent, in 1988, of the gene amplification process using the polymerase chain reaction method (PCR), it was possible to design suitable initiators or “primers” for the selective and specific amplification of T. cruzi kDNA, with the goal of developing a molecular strategy to be applied as additional tool to serological tests for the diagnosis of chronic Chagas infection. Due to the low parasitemia found in the chronic stage of the disease, it became necessary to optimize a PCR-based test that had higher sensitivity than classic parasitological methods (xenodiagnosis, blood culture) and the same specificity, to replace xenodiagnosis in the direct parasitological assessment of chronic Chagas disease patients. In this context, the choice of preserved sequences of kDNA minicircles as target of PCR amplification was crucial for the successful application of the method, taking into account the fact that the parasite’s kDNA network consists of about 10,000 to 20,000 minicircles molecules, where each of these molecules has four preserved regions (for a total of about 40,000 to 80,000 repetitions of these sequences per parasite).In the 1990’s, several works demonstrated the extreme sensitivity and specificity of the PCR technique for the detection of T. cruzi directly from the blood of seropositive Chagas patients, thus revealing that this technique had high potential to evaluate parasitemia, in chronic patients that presented a reduced number of circulating parasites. Most of these works were based on the use of preserved and repeated sequences of kDNA minicircles as amplification targets through PCR, making it possible to detect even a single parasite present in 10 mililiters of blood. It was thus demonstrated that kDNA minicircle sequences were suitable markers at the species level as well as at the strain level, for the respective detection or classification (typing) of T. cruzi.

RNA

RNA processing in Trypanosoma cruzi

Trypanosoma cruzi is a member of order Kinetoplastida, one of the most ancient eukaryotic lineages in evolutionary terms and characterized by the peculiar organization of its mitochondrial DNA, and its gene expression mechanisms boast various uncommon aspects when compared with other eukaryotes. Although most of the main types of RNA are present in this organism, as expected, rRNA 28S is divided in six fragments, resulting in an uncommon assembly in stages of sub-unit 60S of the ribosome. In addition, micro RNAs seem to be absent, with small RNAs of the relevant size range corresponding to fragments of tRNA, rRNA, snRNA and snoRNA.

RNA processing

Almost all protein genes of this organism are organized in groups that are transcribed in coordination into precursors of polycistronic mRNA (Clayton 1992; El-Sayed et al. 2005). As in T. cruzi, just like in the remaining eukaryotes, only monocistronic mRNAs are translated, polycistronic mRNA precursors must be processed into individual mRNAs. This processing takes place through a combination of 5’ trans-splicing and cleavage and 3’ polyadenylation.

Trans-splicing consists in the addition, in trans, of a capped non-coding exon called spliced leader (SL), or mini-exon, to the 5’ extremity of each coding region present in the pre-mRNA. This process solves the issue of creating various individually capped mRNAs starting from a single polycistronic transcript. Trans-splicing was proposed when it was observed that all mRNAs of Trypanosoma brucei had a common sequence of 39 nucleotides in its 5’ terminal which was not transcribed from DNA adjacent to the gene of the remaining mRNA. It was later demonstrated that this process is present in all kinetoplastids, and that the SL sequence is species-specific. In terms of mechanism, trans-splicing is very similar to the cis-splicing of other eukaryotes, and consists in the junction of two sequences based on two RNA-precursor molecules by means of two transesterification reactions (Figure 1). In T. cruzi the SL derives from the first 39 nucleotides of an RNA containing 110 nucleotides called SL RNA, which is transcribed from a large number of genes repeated in tandem in the genome, and receives a cap 5’ that is peculiar to trypanosomatides with cap 4 structure, in which the first four nucleotides after 7-methylguanosine are methylated.

There is plenty of evidence that the trans-splicing of trypanosomatides and the cis-splicing of the remaining eukaryotes are similar when it comes to mechanism: (i) GU-AG consensus sequences of exon-intron and intron-exon juntcions are present in SL RNA and in the pre-mRNA of trypanosomatides; (ii) a Y-shaped RNA intermediary containing a 2’-5’ phosphodiester bond and a loop-shaped analogous intermediary RNA of cis-splicing was identified; (iii) snRNAs U2, U4 and U6 are essential for trans-splicing; and (iv) SL RNA can functionally replace snRNA U1 in the cis-splicing of mammals, while an analogous of snRNA U5 was found in T. brucei. Some splicing factors were identified in T. cruzi, such as protein XB1, ortholog of factor PRP31p of Saccharomyces cerevisiae and which binds to SL RNA, and the ortholog of factor U2AF35.

Although kinetoplastids are the only known organisms in which apparently all mRNAs contain SL, the occurrence of trans-splicing has been demonstrated in other organisms. Caenorhabditis elegans, Ascaris spp., Schistosoma mansoni, Diplonemaspp., Echinococcus multilocularis, platyhelminths, cnidaria, and Ciona intestinalis add SLs to some of their pre-mRNAs, while the tobacco plant andEuglena spp. use trans-splicing to combine separate transcripts that form the codifying sequence of some mitochondrial and chloroplast proteins. On the other hand, it was described that the genes of poly(A) polymerase of T. cruzi and T. brucei are interrupted by an intron that is removed via cis-splicing, indicating that at least some pre-mRNAs of these organisms are processed through cis- and trans-splicing.

In trypanosomatides, trans-splicing and polyadenylation probably occur co-transcriptionally; these seem to be coupled and are partly determined by DNA segments rich in pyrimidines present in the intergene regions. The determining sequences of the 3’ acceptor site of trans-splicing are similar to those used by other organisms for cis-splicing: a region of polypyrimidine followed by an AG dinucleotide. In T. cruzi, mutation analysis showed that the 3’ acceptor site is chosen by first selecting the first AG dinucleotide after the ramification point and the polypyrimidine region, similarly to what happens in other trypanosomatides. On the other hand, mRNAs of trypanosomatides do not possess the telltale sign of polyadenylation AAUAAA, present in 10-30 bases at 5’ of the polyadenylation site of mRNAs in superior eukaryotes. In Leishmania major, polyadenylation tends to occur in multiple sites located at an average distance of 390 bases at 5’ from the 3’ acceptor trans-splicing site, suggesting that the polyadenylation site is specified by the trans-splicing site. In T. brucei, the polyadenylation site is determined by a polypyrimidine site followed by AG located approximately 100 3’ bases from it; however, although it is considerably similar to a trans-splicing acceptor site, this region is located hundreds of bases at 5’ from the major trans-splicing site of the next gene in the transcriptional unit. In T. cruzi, the trans-splicing and polyadenylation sites have been identified in several genes. However, with the exception of the characterization of a nuclear extract with cleavage and 3’ polyadenylation activity, the sequences that determine this process have not yet been investigated in detail in this organism.

Various examples of alternative processing have been described in T. cruzi. Alternative trans-splicing sites are used in the transcripts of the genes of ribosome protein P2 beta, histone H2A, TcRho1 and LYT1, and alternative polyadenylation sites have been described in the genes of histone H2A and HSP10. In some cases, as described for LYT1 and HSP10, RNA processing seems to be the target of gene regulation mechanisms. As gene regulation in T. cruzi and in the other trypanosomatides occurs mainly, perhaps exclusively, at the post-transcriptional level, the relevance of the study of RNA processing in these organisms becomes clear.

RNA-binding proteins

The importance of RNA-binding proteins (RBPs) for the function of the RNAs is becoming increasingly evident. Various RPBs have been identified in the genome of T. cruzi, containing the well characterized domains CCCH, RRM, Pumilio, and SR-related. Examples include TcUBP-1, which binds to the mRNAs of mucin SMUG, TcRBP40, TcRBP19, and TcPABP1, a protein that binds to poly(A). So far, the target RNA of most RBPs identified has not been determined, indicating the need for further research.Figure 1: Trans-splicing and polyadenylation of pre-mRNA in T. cruzi. Schematic representation of pre-mRNA processing. The 5’ splicing acceptor site (5’ SS), the 3’ acceptor site (3’ SS) and the branching point (BP) are indicated. C, cap 5’; SL, spliced leader; Py, polypyrimidine region; pA, poly(A) tail. 1, first transesterification reaction, 2, second transesterification reaction; 3, cleavage and polyadenylation; 4, de-branching and degradation of Y intermediary. Representation not on scale.

**Figure 1 –** Process RNA Tcruzi TuranUermeny.

DNA

**DNA metabolism in Trypanosoma cruzi: what do we know about the 3 Rs (replication, repair and recombination)?**

Carlos Machado

Department of Biochemistry, Biological Sciences Institute, Federal University of Minas Gerais

Email: crmachad@icb.ufmg

All organisms have a genome that controls their growth, survival and interaction with the environment. These data must not only be passed on to the next generations in a dependable fashion, but in addition they must not be blocked or contain errors. This arsenal of information is contained in the DNA sequence, which is exposed to a series of endogenous and exogenous factors that can cause damage to this molecule. Therefore, for an organism to be able to survive and multiply, it must be capable of replicating its information in “hi-fi” fashion and of preventing damage in these data from generating any kind of blockage which may lead to death, or any errors in the information which may lead to a mutation.

DNA metabolism is preserved in all organisms studied so far, and can be inserted in the 3 Rs of DNA, i.e., replication, repair, and recombination. The study of this area in trypanosomatides is a promising field, which, with information from the genome project, has been providing important data on the biology of this organism. In this text, we will discuss some aspects of the replication, repair and recombination of Trypanosoma cruzi.

DNA replication

Parasites require a DNA replication mechanism that is capable of duplicating a genome containing thousands of base pairs, and this process must be controlled so it occurs only once at every cellular division. In T. cruzi, the S phase (the moment at which DNA replication occurs) lasts 2.4 hours in a regular 24-hour cycle. The replication process takes place in an orderly fashion, first with the recognition of the replication origin, then with the initiation, and, finally, with the semi-conservative DNA replicationn.

Although the origin sequence is not well established in all eukaryotes, the proteins that recognize this sequence are. Six OCR proteins recognize the origin and recruit Cdc6; the now formed complex can associate DNA to MCM proteins (chromosome maintenance proteins), which have helicase activity and grant access to the DNA replication machinery. The genome of trypanosomatides reveals that in this organism the replication initiation machinery diverges, in some points, from that of other eukaryotes. The main difference is that a single ORC protein has been verified, and it is more similar to those of Archaea. Although the replication machinery does not present all the control proteins observed in other organisms, it has all the genes involved in DNA replication, such as DNA polymerase alpha, responsible for the synthesis of the initiator; protein RFC, responsible for taking DNA polymerase alpha to the DNA; DNA polymerases delta and epsilon, which are replicating polymerases; in addition to protein PCNA, crucial for the processivity of DNA polymerases.

Trypanosomatides have some biological properties that distinguish them from other eukaryotes. One of the most marked differences is the concatenated DNA contained in the kinetoplast. Concatenated kDNA is the most elaborated mitochondrial DNA in nature, and the complexity of this structure seems to require a replication process that is much more sophisticated than that of other eukaryotes. In all eukaryotes studied so far, only one DNA polymerase (DNA polymerase alpha) is involved in the replication process. In trypanosomatides, there are different polymerasaes located in the kinetoplast and that may be associated with the replication of genetic material in this organelle. In this organism there are four DNA polymerases I (A, B, C, and D) which belong to the same family as DNA polymerase gamma, and yet are much more similar to DNA polymerase I of Escherichia coli. The kinetoplast also contains DNA polymerases found in the nucleus of other eukaryotes. Just like inTrypanosoma brucei, studies in our laboratory have found that DNA polymerase beta, which is usually found in the nucleus, participating in DNA repair pathways, is present in the kinetoplast of T. cruzi. We also observed that the protein codified by the duplication of this gene, designated DNA polymerase beta-PAK, is also located in the kinetoplast, in T. brucei as well as in T. cruzi (data from our laboratory, not yet published). Curiously, another DNA polymerase, the kappa, involved in translesion synthesis of nuclear DNA, was also identified in the kinetoplast of T. cruzi (data not yet published). Are all these seven different polymerases involved in the kDNA replication process? As for the DNA polymerases that are directed to the nucleus in other organisms, have they acquired a direction signal for mitochondrion in trypanosomatides or did they lose this signal in other eukaryotes? Do these DNA polymerases also go to the nucleus? These are some of the many questions that are still waiting for an answer regarding DNA replication in trypanosomatides.

DNA repair and recombination

The genomic instability generated during replication and caused by the large variety of agents that damage DNA would be a serious problem for the cell, if it were not for the repair system. Various repair pathways involving different proteins work in the most varied kinds of DNA damage. These repair mechanisms can be classified in distinct pathways, but they are not fully independent. Some enzymes participate exclusively in certain pathways, while others are redundant. Overlapping between different pathways is also common. General aspects of each repair pathway, as well as what is known for T. cruzi, are described below.

Direct lesion repair

The direct reversion pathway for the lesion involves photolyases and alkyl transferases, enzymes that act, respectively, on some pyramidine dimers (caused by ultraviolet radiation) and alkylated bases, such as O6methylguanine. This is the simplest repair pathway described so far. The chemically modified base is repaired directly, without the need for being removed from the DNA. In the T. cruzi genome, no gene with similarity to photolyase has been identified, but there are genes with similarities to alkyl transfrases.

Translesion synthesis

TransLesion Synthesis (TLS) has been proposed as an alternative repair pathway by Kunkel and collaborators. Involved in this pathway are polymerases, such as DNA polymerase kappa and eta, which have no exonuclear activity and can synthesize DNA through a lesion. Among the lesions processed by these polymerases are pyrimidine dimers and 8-oxoG. Data from our laboratory show that DNA polymerases kappa and eta of T. cruzi are also capable of making this translesion synthesis.

Recombination and NH

Repair through recombination tackles DSBs (double strand breaks) in the DNA. There are two independent pathways working on the repair of DSBs. NHEJ (Non-Homologous End-Joining) is the most commonly used process in mammals. In this repair pathway, the extremities of the chromosome that had a double break are overlapped and reconnected, with the possible loss of one or two nucleotides in the binding site. In other organisms, double breaks are usually repaired via homologous recombination, using the information contained in the undamaged homologous chromosome. NHEJ is a rapid repair method, but it has more tendency to result in errors, unlike homologous recombination. In T. cruzi genome, many genes involved in NHEJ have not been found, which may be a sign that this pathway is not present in this organism. On the other hand, it has been observed that the homologous recombination process in T. cruzi is very intense, and is related to the high resistance to ionizing radiation presented by this organism. This intense recombination process can also be the explanation for the high rates of homozygosis found in T. cruzi, which is to be expected in an organism with clonal reproduction.

Base excision repair

BER (Base Excision Repair) deals with damage to individual bases, such as oxidation, methylation, depurination, and deamination. This pathway involves glycosylases that recognize and remove specific types of altered bases through hydrolysis, generating an abasic site that is then filled with the correct base. One DNA glycosylase and one AP endonuclease of T. cruzi have already been biochemically characterized. DNA polymerase beta, which in humans is involved in the BER process, as described above, is in the kinetoplast of T. cruzi. Therefore, either there are some DNA polymerases involved in this process or DNA polymerase beta is present in the nucleus, but in quantities not detectable by current methods.

Nucleotide excision repair

NER (Nucleotide Excision Repair) corrects lesions that cause distortions in the DNA helix, such as the one produced by pyrimidine dimers formed after exposure to UV light, by removing a DNA fragment of about 30 nucleotides. NER can be divided in two sub-pathways: TCR (Transcription-Coupled Repair) and GGR (Global Genomic Repair). TCR works on the transcribed DNA strand, after blocking transcription by RNA polymerase II. On the other hand, GGR repairs the transcribed strand as well as the non-transcribed one. Genes related to this pathway are present in T. cruzi, with the exception of gene XPA, which is also verified in Plasmodium. An important matter to be investigated is the role played by TCR in T. cruzi, as it is believed that almost the entire genome of this organism is being transcribed.

Mismatch repair

On the other hand, MMR (Mismatch Repair) is a repair pathway that correct mismatched bases in the DNA. This process consists in recognizing the mismatched base, excising the DNA segment that contains the error, and synthetizing the removed region using the parental strand as template. MMR is extremely important to ensure the maintenance of genome stability after DNA replication, increasing its fidelity by about 1000 times. Defects in at least five genes involved in MMR are associated to HNPCC (Human Non-Poliposis Colon Cancer) in humans. All MMR genes are present in T. cruzi; gene MSH2 has already been characterized. Evidence suggests that the MMR pathway has different efficiency levels in the different strains of T. cruzi;those of T. cruzi I seem to have a more efficient MMR than group II strains. This difference may be associated to a higher diversity found in strains of T. cruzi II. This mechanism of generation of genetic variability finds a parallel in mechanisms already described in different bacteria.Although studies of T. cruzi DNA metabolism are still in their infancy, the data already obtained show their importance for understanding the biology of T. cruzi, and they may be able to help us comprehend how these mechanisms have evolved in the different organisms.

DTUs

**Genotypes of Trypanosoma cruzi and clinical scenarios of Chagas disease**

Alejandro Gabriel Schijman, Ph.D

Laboratory of Molecular Biology of Chagas Disease, Genetic Engineering and Molecular Biology Research Institute, INGEBI-CONICET Buenos Aires, Argentina.

Email de contacto: schijman@dna.uba.ar

The clinical forms and the severity of Chagas disease have been attributed to a series of interactions between the complexity of Trypanosoma cruzi, the host, and environmental factors. As a species, T. cruzi has a clonal structure with gene recombination events, which was the source of the heterogeneous, polyclonal, and hybrid characteristics of existing strains. Natural populations can be classified in at least 7 “discrete typing units” (DTUs), from TcI to TcVI, plus Tcbat. The term “discrete typing unit” describes groups of stocks of parasites genetically close to each other, rather than to any other stock. These DTUs are identifiable through specific molecular markers, and they show different geographical distributions, DNA type and dosage of different genes. Different degrees of diversity can also occur within each DTU. It is important to point out that this great diversity was already acknowledged by Dr. Carlos Chagas when he discovered the disease. He actually described different morphological variants, observed via microscopy, in blood parasites, building the foundation to group natural isolates within the species T. cruzi and to classify these isolates using different typing approaches. Early attempts at taxonomy included immunological types by Nussenszweig and collaborators (1963) and the pioneer work of Andrade (1974), which associated specific combinations of morphological and behavioral characteristics of the parasite. Virulence, pathogenicity, immunological properties, DNA contents, molecular karyotypes, genomic polymorphisms, susceptibility to tripanocide drugs are some of the characteristics associated to the populational structure of T. cruzi. Multilocus sequence typing (MLST) is the current gold standard for populational studies of T. cruzi, making it possible to identify diversities in inter-DTU and intra-DTU relations and the occurrence of recombination events.

Due to its geographical distribution, the genetic diversity of T. cruzi must be taken into account when developing diagnostic tests for worldwide use, and any new test must be validated with strains that represent all DTUs. In actuality, diagnostic tests based on some molecular targets using recombinating antigens and DNA amplification are polymorphic and present differential gene expression in strains belonging to different DTUs, and in some cases, strains of the same DTU. Of all genomic sequences of the whole parasite in public databases, only three out of six DTUs are available, DTUs I, II and VI, which are mostly related to infection in humans. This piece of information is still not sufficient for the complete understanding of the evolutionary history of T. cruzi. Different research groups proposed four main models to explain the number of hybridization events and gene exchange through the evolution of the parasite. In general terms, these groups agree that the two hybrid DTUs, TcV and TcVI, are originary from parental strains TcII and TcIII, but they disagree on which is the ancestral DTU and on the number of hybridization events that gave origin to the existing strains. Most of the proposed models were based on only a few genes, and the differences between these models could be biased due to differences in the evolution rates of the genes selected for analysis. It therefore appears that the diversification of T. cruzi in currently existing lineages was a recent event.

A better comprehension of the relations between distinct lineages is crucial to establish effective control strategies. More than 6,300 identifications of DTUs were analyzed based on the geographical and host origins. TcI, with its h high intra-DTU intragenetic polymorphism and wide geographical distribution, prevailed in the samples as a whole, in both cycles, wild and domestic. TcII was infrequent, absent or rare in North and Central America, and was more frequently identified in domestic cycles. This lineage has low genetic diversity and probably found shelter in certain mammal species. TcIII and TcIV were rarely found, with substancial genetic diversity, more associated to wild environmenets, but were also detected in human infections. TcV and TcVI are clearly associated to domestic transmission cycles. Tcbat is a monophyletic lineage prevalent in Brazil, in Panama and in Colombia. This lineage was recently found in a 5-year-old child, in a forest area in the North West of Colombia.

The diversity of hosts and environmental conditions certainly explains the maintenance of parasite diversity and the emergence of new variants through natural selection. Therefore, the distribution of DTUs reported up to this moment is a temporary representation which will inevitably evolve over time, as soon as more samples become available due to the ongoing environmental and climate changes.

There is much speculation on whether this variability is associated with the prognostic of the disease. As Chagas disease is a relatively new event in the evolutionary history of T. cruzi, we should expect that different parasitic populations may present different rates of infectivity, virulence, and ability to develop disease in humans. It is currently accepted that all DTUs are infective for humans and cause Chagas disease.

In 1998, doctors Andrea Macedo and Sergio Pena proposed a histotropic clonal model according to which different clones of the parasite presented different tissue tropisms. Based on evidence of an association between parasite persistence and tissue damage, as is observed using molecular approaches, the model leads to the assumption that different tissue tropisms of parasite genotypes could be the key stage in the development of different clinical forms. In any case, the correlation between the clinical presentation of chronic Chagas disease and DTUs was incidental, not proven, and made more difficult by the occurrence of mixed infections, DTUs sequestration in tissues, and complex interactions with the host’s immune response. In addition, it is difficult to demonstrate associations due to the rate of asymptomatic patients that could, however, already have organ alterations but are still cryptic at physical exams and can only be detected through more sophisticated diagnostic methods. Studies on peripheral blood samples do not necessarily reveal the full and true universe of genotypes responsible for the manifestations of the disease, as clones can be grouped into tissues, with different proliferation rates, existence of dormant forms, showing low blood values and preventing their detection. Studies based on the use of cultures in vitro and animal models are also limited by the fact that only the most competitive clones, due to a more rapid division rate or to a greater abundance of the original sample, would be detected, while others, which could be the actual cause of the disease, might not be.

In spite of these limitations, polymerase chain reaction (PCR) opened new possibilities as a molecular tool for the identification of the DTUs of T. cruzi and for the variability within each DTU directly from biological samples, using different downstream strategies, such as polymorphism in the length of restriction fragments, dissociation curves in high resolution, polymorphisms in micro satellite loci, and sequencing. TaqMan multiplex probes based on real-time PCR were also developed. In addition, serological assays based on trypomastigote small surface antigens (TSSA) capable of discriminating humoral response against TcI to TcII, TcV or TcVI were also reported. The information available suggests that strains of the parasite detected in patients, regardless of their clinical presentation, reflect the main DTU circulating in the domestic transmission cycles in a given region. This is also true for congenital Chagas disease; samples of newborns congenitally infected showed that they were infected by the DTU present in the mothers and in the same proportion as in the general population. Even then, at an infra-DTU level, strains with different tropisms and levels of infectivity in placental tissues have also been reported.

In outbreaks of orally transmitted disease, wild strains are the main culprit. In oral outbreaks in Venezuela, Colombia and French Guiana, TcI is prevalent and causes acute cardiopathy manifestations. TcIV was detected in oral outbreaks in Venezuela and in the Bolivian and Brazilian Amazon, while TcII and TcVI have also been detected in the Brazilian southern state of Santa Catarina. In murin models of digestive infection, it was demonstrated that different strains lead to different degrees of infectivity.

Chagas cardiomyopathy and EKG abnormalities occur in all endemic regions. Studies in Argentina and Bolivia, working with 300 samples, made it possible to associate chronic infection to populations belonging to TcII, TcV or TcVI, regardless of the clinical manifestations. This was also found for isolates of TcII of chronic patients from the Brazilian South-East. It should be highlighted, however, that in patients with chronic Chagas cardiomyopathy who underwent a heart transplant, TcI could be detected in tissues of cardiac explants. In some patients, mixtures of DTUs, subgenotypes of TcI and clones of the same DTU were observed, with different distribution between tissues and blood. It is interesting to observe that a multivariate analysis showed correlation between larger cardiac dilation in hearts infected with TcI, when compared with other DTUs. TcII was identified in heart disease patients from the states of Pernambuco, Bahia and Minas Gerais, but also in asymptomatic patients and in patients with megaesophagus in the same regions.

Digestive mega syndromes are common in the Southern Cone and rare in the north of the Amazon region. This form of the disease has been associated to TcII and TcV in Bolivia and to TcII and TcVI in Brazil. It therefore appears that TcI does not cause digestive mega syndromes, or that it has not yet been found in digestive tissues.

TcVi was found in asymptomatic individuals from Bahia to Rio Grande do Sul, in other regions of the Southern Cone. In Chagas patients with heart transplant, reactivation occurs in part of the cases. In these cases, with increased parasite load in the blood and episodes of panniculitis, different DTUs were detected (TcI, TcV and TcVI). The coexistence of parasite clones belonging to different DTUs was also observed in cases of Chagas-AIDS. Different DTUs were detected in the blood and in the spinal fluid, or in brain biopsy material.

Mixed infections and reinfections have the potential to worsen the progression of the disease and therefore affect the clinical handling of patients with Chagas disease.

In murin models infected with a mixture of isolates, the parasite can be found in a large number of organs; a more disseminated infection may occur, although there is no evidence that mixed infections have a higher pathological effect. On the other hand, reinfections have significant effects on the host’s survival and in the progression of the disease. This confirms the research that shows spontaneous deaths after reinfection in mice, and a higher severity of the disease in patients who live in endemic rural areas with active transmission.

The progression of Chagas disease varies significantly between different hosts, and is affected by multiple factors. In murin models, it was observed that the same DTU may present different tissue tropisms depending on the MHC genotype of the host. Studies with patients from Brazil, Venezuela and Mexico showed that some alleles of histocompatibility differ between the indeterminate form and cardiac patients.

SNP polymorphisms in human placental genes have been associated to susceptibility to acquisition of the congenital infection. Therefore, the interrelation between the parasite and the host’s genetics must play an important role in the definition of the pathogenesis of Chagas disease, and, as a consequence, scientific research should focus on understanding this parasite-host interaction.

T. cruzi diversity and susceptibility to drugs

Tests in vitro using a panel of strains and clones that represent the genetic diversity of T. cruzi seem to be absent from most drug discovery programs for Chagas disease. After a meeting of specialists in Rio de Janeiro in 2012, it was recommended that once promising candidate drugs are identified, they should be tested for a wider activity against two or three representatives of each DTU, as secondary triage. Priority should be given to the DTUs most commonly associated to infection in humans (TcIDOM, TcII, TcV and TcVI), and preferably with different replication rates, as this parameter may impact the response to drugs, especially to analogues of benznidazole or nifurtimox. Susceptibility and natural resistance to these two drugs have been reported regarding a long list of strains.

Although the in vitro evaluation of drugs against members of different DTUs does not ensure the success of a drug in humans, divergent activities between strains may indicate the probability of failure. Therefore, triage using various parasite lineages would provide better foundations for decisions regarding which compounds should and should not be developed, or whether or not it would be necessary to proceed with the characterization of the antiparasitic activity of a compound before moving on to other development stages. Another advantage of in vitro assays is that they may make it possible to have a more direct assessment of the “natural” resistance of a strain/clone to a compound, as they have less variables than in vivo models.Natural resistance to benznidazole and nifurtimox, verified in vivo and in vitro for some parasite stocks, has not been associated to any DTU in particular, and does not justify the marked differences observed in anti-parasite efficacy of both drugs in the acute and chronic phases of Chagas disease, but this might not be the case for other substances.

Home