EG10849-MONOMER

From BioE80 Boot
Jump to: navigation, search

Author Information

Bryce Marion

Basic Information

  • ID: EG10849-MONOMER
  • Name: rhsD
  • Organism: E. coli
  • Description: There are five homologous rhs loci that encode hydrophilic proteins with repetitive sequence elements and divergent C-termini. RhsA, RhsB, and RhsC define one subfamily, and RhsD and RhsE define a second subfamily of Rhs elements. Additional rhs elements and subfamilies have been characterized in other strains (not K-12). RhsA, RhsB, RhsC, and RhsD include 28 copies of the core repeat element. RhsD transcription is induced by sulfate starvation. (https://ecocyc.org/gene?orgid=ECOLI&id=EG10849-MONOMER)
  • DNA Length: 4278 base pairs.
  • DNA sequence:

ATG TCA GGG AAA CCT GCT GCG CGC CAG GGC GAC ATG ACA CAA TAC GGT GGT CCC ATC GTC CAA GGT TCC GCG GGG GTT CGT ATT GGC GCT CCT ACC GGG GTC GCG TGT AGC GTC TGT CCC GGC GGG ATG ACC AGT GGC AAC CCA GTG AAT CCC TTG CTT GGG GCG AAG GTT CTT CCT GGG GAA ACA GAT TTA GCT TTA CCG GGC CCA CTG CCT TTC ATC CTT TCA CGT ACC TAC TCA TCT TAT CGT ACC AAA ACA CCC GCC CCC GTC GGA GTG TTT GGT CCC GGA TGG AAG GCA CCT AGC GAT ATC CGC CTT CAG CTG CGT GAC GAT GGC TTA ATC TTA AAT GAC AAC GGG GGG CGC TCC ATT CAT TTC GAG CCG CTG TTA CCC GGC GAA GCG GTT TAC TCA CGT AGT GAG AGC ATG TGG TTA GTT CGC GGT GGT AAG GCT GCT CAG CCT GAC GGG CAT ACG CTT GCT CGC CTG TGG GGT GCA TTG CCC CCT GAC ATC CGT CTT TCT CCG CAT TTG TAC TTA GCA ACA AAT AGC GCA CAG GGC CCA TGG TGG ATT TTA GGC TGG TCA GAG CGC GTA CCG GGA GCG GAG GAT GTG TTA CCT GCT CCC CTG CCT CCG TAC CGT GTA TTG ACA GGC ATG GCC GAC CGC TTT GGG CGT ACA TTG ACA TAC CGT CGT GAG GCC GCC GGT GAT CTG GCC GGA GAG ATT ACT GGT GTT ACC GAC GGA GCA GGC CGT GAA TTC CGT TTG GTC TTA ACC ACT CAA GCG CAA CGC GCG GAA GAA GCA CGT ACG AGT AGT TTG AGT AGT AGT GAC AGC TCC CGT CCC CTG TCA GCG AGT GCT TTT CCG GAC ACT CTT CCG GGC ACG GAG TAC GGA CCC GAT CGT GGC ATC CGC CTG TCC GCA GTG TGG CTG ATG CAC GAC CCT GCC TAC CCT GAG TCC CTT CCG GCG GCG CCC CTT GTC CGC TAT ACG TAC ACA GAG GCT GGA GAG TTG CTG GCT GTT TAT GAC CGT AGC AAT ACG CAA GTT CGT GCT TTT ACG TAC GAC GCG CAG CAC CCC GGG CGC ATG GTC GCC CAC CGC TAT GCC GGA CGT CCT GAG ATG CGC TAT CGT TAC GAT GAT ACC GGT CGT GTT GTC GAA CAA CTG AAC CCG GCT GGA TTA AGT TAT CGT TAC TTG TAC GAA CAA GAC CGC ATT ACG GTT ACC GAC AGC TTG AAT CGC CGT GAG GTA TTG CAT ACC GAG GGG GGG GCG GGC TTA AAG CGC GTG GTG AAA AAG GAG TTA GCC GAT GGC TCA GTG ACT CGT TCG GGT TAT GAT GCC GCG GGG CGT TTG ACT GCT CAA ACT GAC GCA GCC GGA CGT CGT ACT GAG TAT GGG TTA AAC GTA GTC AGT GGG GAT ATT ACG GAC ATT ACT ACT CCA GAT GGG CGC GAG ACT AAA TTT TAC TAT AAT GAC GGA AAT CAA CTT ACG GCG GTG GTA TCA CCC GAC GGA TTA GAA TCC CGC CGC GAA TAC GAT GAA CCG GGG CGT TTG GTC TCT GAA ACA TCA CGT TCG GGG GAA ACA GTT CGT TAT CGT TAT GAT GAC GCG CAC TCA GAA TTG CCT GCA ACC ACC ACC GAT GCA ACG GGA TCT ACC CGC CAA ATG ACT TGG TCC CGC TAT GGA CAG TTA TTG GCC TTC ACT GAC TGC TCA GGG TAT CAG ACG CGT TAT GAG TAC GAT CGT TTT GGT CAA ATG ACT GCT GTG CAT CGC GAA GAG GGC ATC TCC TTG TAT CGT CGC TAC GAT AAT CGT GGC CGC TTG ACT TCC GTC AAG GAT GCG CAG GGA CGC GAA ACC CGT TAT GAA TAC AAT GCA GCA GGG GAC CTT ACG GCG GTC ATC ACG CCA GAC GGC AAC CGT TCG GAG ACT CAA TAC GAC GCG TGG GGT AAG GCA GTT AGT ACT ACG CAA GGG GGC TTA ACA CGC TCA ATG GAG TAT GAC GCG GCC GGA CGC GTG ATT AGC CTT ACA AAC GAG AAC GGA AGC CAC AGT GTT TTT TCG TAC GAC GCG CTG GAC CGT CTT GTC CAG CAG GGG GGC TTC GAC GGC CGT ACT CAG CGC TAT CAT TAT GAC CTT ACC GGC AAG TTA ACG CAG TCA GAA GAT GAG GGT CTG GTA ATT CTT TGG TAT TAC GAC GAA TCC GAT CGT ATT ACT CAC CGT ACC GTT AAT GGC GAG CCC GCG GAG CAA TGG CAG TAT GAC GGG CAC GGG TGG TTA ACC GAC ATT TCA CAT CTG TCA GAA GGT CAT CGC GTT GCA GTA CAC TAC GGT TAC GAC GAC AAG GGC CGT TTA ACG GGC GAA TGC CAG ACG GTA GAA AAT CCG GAA ACC GGT GAG CTG TTG TGG CAG CAC GAA ACA AAA CAT GCC TAC AAC GAG CAG GGC TTA GCC AAT CGT GTG ACC CCG GAT TCC CTG CCC CCG GTA GAA TGG TTG ACT TAT GGA TCT GGC TAC TTG GCA GGG ATG AAA CTT GGA GGG ACC CCC CTT GTG GAG TAC ACT CGC GAT CGT CTT CAT CGT GAA ACT GTA CGC TCC TTC GGG TCC ATG GCT GGT AGC AAC GCT GCG TAC GAG CTT ACG TCG ACC TAT ACC CCG GCA GGG CAA CTT CAG TCT CAG CAC TTG AAC TCA CTG GTA TAT GAT CGC GAC TAC GGC TGG TCA GAC AAC GGG GAT TTA GTC CGT ATT TCG GGA CCG CGC CAA ACG CGC GAG TAC GGA TAT TCT GCA ACG GGT CGC CTG GAA TCA GTC CGC ACG TTA GCG CCT GAC CTT GAC ATT CGC ATC CCG TAC GCC ACG GAC CCT GCA GGC AAT CGC CTG CCC GAT CCA GAG CTT CAT CCC GAT TCG ACG CTG ACA GTG TGG CCG GAT AAC CGT ATC GCG GAG GAT GCT CAT TAT GTG TAT CGT CAT GAC GAA TAC GGT CGC TTG ACG GAA AAA ACC GAT CGT ATC CCG GCA GGG GTC ATT CGT ACT GAT GAC GAA CGC ACC CAT CAT TAT CAT TAC GAC TCA CAG CAT CGT CTG GTT TTC TAT ACC CGT ATC CAA CAC GGT GAA CCG TTA GTG GAA TCT CGC TAT CTG TAC GAC CCC TTG GGT CGT CGC ATG GCT AAG CGC GTA TGG CGC CGC GAA CGC GAC CTT ACG GGG TGG ATG TCG TTG TCT CGC AAA CCC GAA GTA ACT TGG TAC GGA TGG GAT GGG GAT CGT CTG ACT ACA GTA CAA ACT GAT ACA ACC CGT ATC CAA ACC GTA TAT GAA CCG GGA TCC TTC ACG CCC CTG ATC CGT GTC GAG ACA GAG AAC GGC GAG CGT GAA AAA GCG CAA CGC CGT AGC TTA GCC GAA ACC CTG CAG CAA GAA GGA TCA GAG AAT GGG CAC GGC GTG GTT TTT CCC GCT GAA TTA GTG CGT TTG TTA GAT CGT TTG GAA GAG GAG ATC CGT GCG GAC CGT GTG AGT TCT GAA TCG CGC GCC TGG TTG GCA CAA TGC GGC CTT ACG GTG GAA CAG CTT GCC CGT CAA GTC GAA CCG GAG TAC ACT CCC GCA CGC AAG GCC CAT TTA TAC CAT TGT GAC CAT CGT GGG CTT CCC CTG GCC TTA ATT TCA GAG GAC GGA AAC ACG GCC TGG TCC GCA GAA TAT GAC GAG TGG GGG AAC CAG CTT AAT GAG GAG AAT CCT CAT CAC GTC TAC CAA CCC TAC CGT TTA CCT GGT CAG CAA CAT GAT GAG GAG TCT GGG TTG TAC TAT AAT CGC CAT CGT TAT TAC GAT CCA TTG CAA GGG CGT TAT ATT ACA CAA GAC CCG ATG GGC CTT AAA GGT GGG TGG AAC CTT TAT CAA TAC CCC TTA AAT CCC CTG CAA CAA ATC GAT CCG ATG GGG TTG TTG CAG ACG TGG GAT GAT GCG CGT TCT GGG GCC TGT ACG GGA GGC GTA TGC GGG GTG TTG TCC CGT ATC ATC GGA CCT AGC AAA TTT GAC TCA ACT GCT GAC GCT GCA CTT GAT GCT CTG AAG GAG ACC CAG AAC CGC TCT CTG TGC AAT GAT ATG GAA TAT TCC GGA ATC GTC TGC AAG GAC ACC AAT GGG AAA TAC TTC GCT TCT AAG GCT GAA ACC GAT AAT CTT CGT AAA GAA TCA TAC CCG TTG AAG CGT AAA TGC CCT ACT GGC ACC GAT CGT GTG GCG GCA TAT CAT ACC CAC GGT GCT GAT TCT CAC GGG GAT TAT GTC GAC GAA TTT TTT TCC TCC TCC GAC AAG AAC CTG GTT CGC AGC AAG GAC AAC AAC CTT GAG GCT TTC TAT CTG GCC ACA CCA GAC GGG CGT TTC GAA GCG CTT AAT AAC AAG GGC GAA TAT ATT TTT ATT CGT AAT AGT GTT CCC GGA TTA AGT TCT GTG TGC ATC CCT TAC CAT GAC TAA

  • Amino Acid length: 1449 amino acids.
  • Amino Acid sequence:

MSGKPAARQGDMTQYGGPIVQGSAGVRIGAPTGVACSVCPGGMTSGNPVNPLLGAKVLPG ETDLALPGPLPFILSRTYSSYRTKTPAPVGVFGPGWKAPSDIRLQLRDDGLILNDNGGRS IHFEPLLPGEAVYSRSESMWLVRGGKAAQPDGHTLARLWGALPPDIRLSPHLYLATNSAQ GPWWILGWSERVPGAEDVLPAPLPPYRVLTGMADRFGRTLTYRREAAGDLAGEITGVTDG AGREFRLVLTTQAQRAEEARTSSLSSSDSSRPLSASAFPDTLPGTEYGPDRGIRLSAVWL MHDPAYPESLPAAPLVRYTYTEAGELLAVYDRSNTQVRAFTYDAQHPGRMVAHRYAGRPE MRYRYDDTGRVVEQLNPAGLSYRYLYEQDRITVTDSLNRREVLHTEGGAGLKRVVKKELA DGSVTRSGYDAAGRLTAQTDAAGRRTEYGLNVVSGDITDITTPDGRETKFYYNDGNQLTA VVSPDGLESRREYDEPGRLVSETSRSGETVRYRYDDAHSELPATTTDATGSTRQMTWSRY GQLLAFTDCSGYQTRYEYDRFGQMTAVHREEGISLYRRYDNRGRLTSVKDAQGRETRYEY NAAGDLTAVITPDGNRSETQYDAWGKAVSTTQGGLTRSMEYDAAGRVISLTNENGSHSVF SYDALDRLVQQGGFDGRTQRYHYDLTGKLTQSEDEGLVILWYYDESDRITHRTVNGEPAE QWQYDGHGWLTDISHLSEGHRVAVHYGYDDKGRLTGECQTVENPETGELLWQHETKHAYN EQGLANRVTPDSLPPVEWLTYGSGYLAGMKLGGTPLVEYTRDRLHRETVRSFGSMAGSNA AYELTSTYTPAGQLQSQHLNSLVYDRDYGWSDNGDLVRISGPRQTREYGYSATGRLESVR TLAPDLDIRIPYATDPAGNRLPDPELHPDSTLTVWPDNRIAEDAHYVYRHDEYGRLTEKT DRIPAGVIRTDDERTHHYHYDSQHRLVFYTRIQHGEPLVESRYLYDPLGRRMAKRVWRRE RDLTGWMSLSRKPEVTWYGWDGDRLTTVQTDTTRIQTVYEPGSFTPLIRVETENGEREKA QRRSLAETLQQEGSENGHGVVFPAELVRLLDRLEEEIRADRVSSESRAWLAQCGLTVEQL ARQVEPEYTPARKAHLYHCDHRGLPLALISEDGNTAWSAEYDEWGNQLNEENPHHVYQPY RLPGQQHDEESGLYYNRHRYYDPLQGRYITQDPMGLKGGWNLYQYPLNPLQQIDPMGLLQ TWDDARSGACTGGVCGVLSRIIGPSKFDSTADAALDALKETQNRSLCNDMEYSGIVCKDT NGKYFASKAETDNLRKESYPLKRKCPTGTDRVAAYHTHGADSHGDYVDEFFSSSDKNLVR SKDNNLEAFYLATPDGRFEALNNKGEYIFIRNSVPGLSSVCIPYHD

Function and Homologs

  • Module: Protein
  • Closest homologous proteins: The top (max three) homologous proteins to this protein, as identified by BLAST searches.
    • MULTISPECIES: RHS element protein [Escherichia], Max Score = 2927/Query Cover = 100%/E-Value = 0.0/Ident = 100%, WP_000014739.1
    • protein RhsD [Escherichia coli], Max Score = 2925/Query Cover = 100%/E-Value = 0.0/Ident = 99%, WP_021579203.1
    • RHS element protein [Escherichia coli], Max Score = 2924/Query Cover = 100%/E-Value = 0.0/Ident = 99%, WP_047667029.1
  • Equivalent E. coli / JCVI functional protein: MMSYN1_0366

Expression

  • Expression Level: medium
  • Expression Level Hypothesis: The gene is expressed at this level because translation, the decoding of mRNA into protein, is the third and final element of the central dogma. However, the number of ribosomes has to also be limited, so this gene will be expressed to a medium level.
  • Expression Level References and Description: The expression level data was gathered from the Escherichia coli proteome dataset. There was no input directly for rhsD, so the data was found under a close relative.1
  • Expression Time: At the beginning
  • Expression Level Hypothesis: The gene is expressed at this time because because it's a central dogma component. Furthermore, the cell needs ribosomes that work properly in order to develop further. The cell will have trouble creating the necessary materials to survive without the expression of this gene.
  • Expression Time References and Description: 2

Gene Context

  • Possible Dependencies: Histidine biosynthesis because amino acids are needed for the growing protein chains produced by ribosomes.
  • Process: Formation of 30s ribosomal subunit.
    • Inputs: 16s ribosomal RNA & the 30s ribosomal subunit proteins S1-S21
    • Outputs: 30s ribosome subunit

Construct

  • Synthesis Score: The synthesis score of your construct: 1, 2,3
  • Predicted Translation Rate: Prediction of construct translation rate from the RBS calculator
  • Design Notes and Details: For example, had to use a rare codon to fix folding energy;
  • GenBank File: A link to the GenBank file. file