Bioinformatics Resume
HOWARD J. COHEN, Ph.D.
President
Cohen Software Consulting, Inc.
3272 Cowper Street
Palo Alto, CA 94306-3004
(650) 856-8123
(650) 856-4273 (fax)
howard@cohensw.com
http://www.cohensw.com
Selected Projects
- Sequencing Image Analysis
Worked on portions of a semi-automatic DNA sequencing system, including
definitions for extensions to the Standard Gel File format (SGF);
work on algorithm design and implementation in bringing a weak DNA
signal out of a noisy multi-channel image; work in algorithm design
and implementation for lane tracking within that image; and creation
of various false-color diagnostic GIF images of the original,
signal-processed, and lane tracked data, to enhance understanding
of the nature of the data and the workings of the algorithms.
Environment was Sun UltraSparc, Solaris, C (gcc and xxgdb), and the
gd library for GIF creation.
- Chromatogram Archive
Working on the Chromatogram Archive project, a suite of Perl scripts
and C programs to manage and process about 13 million chromatograms
of human, mammalian, plant, and pathogen expressed gene sequences,
as part of a larger project to reanalyze all of Incyte's proprietary
human sequences and a large number of public domain sequences
(LifeSeq Gold). This archive includes several terabytes of data.
Developed an Oracle database to manage and index this archive as
well as make retrieval of specific data fast and simple. Developed
a suite of software tools to allow loading of this DB and the
assembly and delivery of both small and very large sets of the
archived data to customers. Environment is OSF1, Solaris, Oracle Pro*C.
- LifeSeq Gold
Involved in schema design for both in-house production
DB and for the DB to be released to customers. Designed and
implemented the Annotation program (using public domain databases
to understand the assembled proprietary putative genes). Also
working on the software development and release environment for
multi-platform porting. Environment is Sun Enterprise servers
and desktop workstations, DEC Alphas, SGI Octane, Linux and SCO
desktops, Perl 5.0, C (gcc and xxgdb), and Oracle, including Pro*C
and SQL*Plus. Documentation is in HTML on internal web pages.
Development environment is RCS, TCCS, gmake and various scripting
languages.
- IGP ("The Incyte Genome Project")
Involved in various aspects of this
project to ingest the entire public domain human DNA sequence set as
well as the Incyte-proprietary genome sequence data, screen it, and
apply gene finding and annotation techniques to it. This project
encompasses Incyte's LifeTools database and software with Genomic
Enhancements, the LifeSeq Gold data, and the highly efficient
distributed processing system described immediately following.
- The Brewery and the Farm
Co-architect and implementor of a system for
using coarse-grained parallelism of tasks to distribute them to a
farm of client machines (compute servers) of varying sizes,
capacities, speeds, and architectures. An Oracle database and a
client-side pull manager are the central features of this
load-balancing, throughput-enhancing system, essential for the
success of LifeSeq Gold and the IGP bimonthly deliveries.
- TCCS and Porting
Installed, set up, and administered TCCS (Trivial Configuration Control
System) for a complex multi-user, multi-platform development
environment. Developed GNUMakefile's and scripts, user and internal
documentation.
- Foundation Project
A next generation database and dataflow architecture
for annotation and gene finding in the entire human genome, combining
public domain and proprietary data. Participated in the dataflow and
database architecture and design. Designed and implemented an XML
parser for genomic data and the database loader program, created for
speed and efficiency in processing millions of cDNA sequences
and gigabytes of gDNA information. Oracle/Pro*C/C, SQL*Plus, HTML
documentation.
- Mass Spectrometry
Worked on software for the analysis and display of 2-dimensional
Mass Spectrometry data. The project involved decoding and reading
the manufacturer's proprietary compressed data format, using
a known impurity mass to compute gain corrections, and applying
deisotoping and decharging algorithms. Environment was Linux for
algorithm development as well as implementation of a high
thruput pipeline, g++/C++ and STL; and Windows2000, C++ and STL
for the interactive and graphical version of the software.
- Patents
Co-inventor of 114 patents applied for by Incyte Genomics, 4 by
Nortel Networks.
Return to Home
Return to Resume
Return to Publications and Patents
Site Map
Last update: 03 May 2015
Copyright © 2001 - 2015 Cohen Software Consulting, Inc.,
All Rights Reserved
For further information, please contact
howard@cohensw.com