lumi **

Lumi is an ongoing project for Illumina microarray data analysis, including the design of a software package in R/Bioconductor and the discussion of microarray experiment design.

** an open source project at the Bioinformatics Core of the Robert H. Lurie Comprehensive Cancer Center, Northwestern University.


No Panacea (yet): NGS for Small RNA

"NGS is the future", so it was said. However, the NGS-based digital expression profiling of small RNA is reported to be biased. The problem appears not to be the NGS platforms (SOLiD or Solexa), but the methods used for preparing the small RNA library.

Here is the link to the Nature Methods (July 2009) paper

Labels: , , , , ,


Adding nuID to BeadStudio Outputs

Here is quick review from the BeadStudio Manual.

Labels: ,

Group Gene Profile, Sample Probe Profile, vs. Sample Gene Profile,

A lot of statisticians who are new to Illumina data analysis always ask the difference between the following files: Group/Sample Gene/Probe Profiles. Here I found a good screen shot from BeadStudio.

Labels: , ,


Data format for the Lumi Package

How to generate a microarray output file to be analyzed by the lumi package in bioconductor? From the Illumina BeadStudio software, select "> File> Export to GeneSpring > Group Probe Profile". A file called "Group Probe Profile.txt" will be generated. That's it!

Labels: , , , ,


Digging deeper into PubMed

You searched PubMed using a keyword of "GATA". Bingo! 5,881 papers on this topic. What are you going to do with it? Read them one-by-one? You know it is going to ruin your postdoc life.

So, you need some bioinformatics tools for it!

Here is an analysis on the major topics of these 5,881 papers. It is clear that GATA is a transcription factor regulating cell differentiation.

An author analysis suggests the leading researchers in this field:

One step further, you ask yourself: if I get a paper on this topic, where sould I publish it? Here are some thoughts:Is "GATA" a hot topic over years? Absolutely!So, you asked, what is this wonderful tool? --- Anne O'Tate.

Another similar tool, e-LiSe, can be found at

Labels: ,


Positive Control for Methylation Studies

Recently, methylation arrays have been introduced as an easy tool to scan the genome for methylation differences.

Without a positive control, how do you know your protocol is working?

Bock C, et al (PMID 16520826 ) reported one gene PDE9, to be highly methylated in lymphocytic DNA. It might be used as a positive control gene. Like so called house-keeping genes, the methylation of this gene, however, might be regulated under some special circumstances.

A second method calls for spike-in. One can artificially spiking his samples with SssI methylase methylated DNA. SssI methylates all CpG resides with high efficiency.

Labels: , ,


QC Interpretations of the Metrics.txt file

Under the "Image Data" folder, along with the TIF, IDAT and LOCS files, there is a "Metrics.txt" file that summarize a few QC indicators during the image scanning process.

Of interest is the "RegGrn", which runs from 0 (bad) to 1 (good), to indicate if the image has been registered properly. Note that the "RegRed" is irrelevant for expression arrays, because the expression arrays are single color.

Although a value of 0 on the "RegGrn" should suggest a problem, Dr. Wei Shi (WEHI) in a recent Bioconductor post suggested that sometimes the data looks fine when RegGrn is close to zero. Dr. Shi quoted an answer from Illumina technical support on this issue:

The 0 in the metrics file may indicate that there is a problem with
registration of the stripe but there are other ways to look at this.
First, have you looked at the data from the Beadchip in BeadStudio?
Does the data appear consistent with the rest of the samples in your
experiment? In addition, do the controls look OK for this sample when
compared to the other samples?

Next, you can look at the registration visually in BeadStudio as long as
you have saved tif images when you scanned the BeadChip. To do so, go
to the Analysis menu in BeadStudio and select View Image then choose the
stripe you are interested in. Now select Overlay Cores (the icon looks
like 3 blue-green circles) and look at the image to see if the green
circles line up with the intensity spots on the image. If the BeadChip
is not registered the circles representing cores will not line up with
the spots on the array and the circles themselves will be somewhat

If you have reason to suspect that registration is affecting your data,
you can rescan the BeadChip. Gene expression BeadChips can be rescanned
as long as you do so within a few days and if the BeadChips have been
stored in the dark. If you do recan the BeadChip, please uncheck "Save
Compressed Images"; this will allow you to save tif images.

Labels: , ,


The new BGX format for annotation

Illumina started providing annotation (manifest) files in the new BGX format.

After a few trial and error, I found the BGX format is actually just a gZIP format [correction after the Commenter!]of the text version.

After unzipping, we can see

Date 21/9/2007
ContentVersion 1.1
FormatVersion 1.0.0
Number of Probes 46643
Number of Controls 1675
Species Source Search_Key Transcript ILMN_Gene Source_Reference_ID RefSeq_ID Unigene_ID Entrez_Gene_ID GI Accession Symbol Protein_Product Probe_Id Array_Address_Id Probe_Type Probe_Start Probe_Sequence Chromosome Probe_Chr_Orientation Probe_Coordinates Cytoband Definition Ontology_Component Ontology_Process Ontology_Function Synonyms Obsolete_Probe_Id
Mus musculus Riken ri|C730035M01|PX00087M15|AK050300|1404 ILMN_204164 THRSP ri|C730035M01|PX00087M15|AK050300|1404 AK050300 Thrsp ILMN_1243094 102690609 S 1127 GCCCTGCCTGACCTGGAAACGTAGAGATTCTTCTGCCTCAGGTTCCAGAG ri|C730035M01|PX00087M15|AK050300|1404-S-1
Good news is that Illumina also provides the control probes information at the end of the file:

Probe_Id Array_Address_Id Reporter_Group_Name Reporter_Group_id Reporter_Composite_map Probe_Sequence

Labels: , ,