What are the best recommended research topics on databases according to edge technologies and recent research trends?
What are the best recommended research topics on databases according to edge technologies and recent research trends?
- Acording to the DBWorld mailing list trends the hottest topic of research today is Distributed Databases (tera-scale and peta-scale): http://www.cs.wisc.edu/dbworld/
- Also see VLDB conference program: http://www.vldb2010.org/accept.htm and CIDR: http://www.cidrdb.org/cidr2011/p...
- Check out Google Research papers on Distributed Systems : http://research.google.com/pubs/... and Information Retrieval : http://research.google.com/pubs/... and What are the most interesting Google Research papers?
- See Microsoft (company)'s recent publications in the field: http://research.microsoft.com/en... and the list of major conferences: http://academic.research.microso...
- I highly recommend the research blog by Daniel Abadi for some emerging topics and trends: http://dbmsmusings.blogspot.com/ and his publications: http://cs-www.cs.yale.edu/homes/... also see Jeff Hammerbacher's answer to What are the best, insightful blogs about data, including how businesses are using data?
- Commercial databases and data stores such as Amazon Dynamo http://en.wikipedia.org/wiki/Dyn...), Google Bigtable (http://en.wikipedia.org/wiki/Big...) and Google Percolator(http://research.google.com/pubs/...), kdb+ by KX systems(http://kx.com/Products/kdb+.php), C-store by Vertica (company) (http://en.wikipedia.org/wiki/Mic...), Times Ten(http://www.oracle.com/timesten/i...) and open source implementations such as Redis , Cassandra (database), HBase, MongoDB, Riak, MonetDB, Scalaris, H-store(http://hstore.cs.brown.edu/ ) are worth exploring in depth. Also see http://en.wikipedia.org/wiki/Dis...
- The recent Data explosion (http://en.wikipedia.org/wiki/Big...) led to a major revision of relational model applicability in certain domains where its restrictions are overly limiting (see http://en.wikipedia.org/wiki/NoSQL). For a good overview of emerging non-relational databases and key-value stores see Varley. No Relation: The Mixed Blessings of Non-Relational Databases: http://ianvarley.com/UT/MR/Varle... , Abadi's thesis, Query Execution in Column-Oriented Database Systems: http://cs-www.cs.yale.edu/homes/... and Meijer & Bierman, A co-Relational Model of Data for Large Shared Data Banks: http://queue.acm.org/detail.cfm?...
- Graph Databases (http://scholar.google.com/schola...) are extremely important in the modern Online Social Networks and many other domains, this is a topic of an active research (e.g. see Neo4j, HyperGraphDB, InfiniteGraph) , http://www.graph-database.org/ , http://nosql-database.org/
- See Microsoft Trinity, a graph database over distributed memory cloud: http://research.microsoft.com/en... and Google Pregel, a system for large-scale graph processing: http://portal.acm.org/citation.c...
- Specialized Database Systems and Data Warehousing in Bioinformatics could be a good topic for applied research e.g see Atlas: http://www.biomedcentral.com/147... and Bowtie ecosystem: http://bowtie-bio.sourceforge.ne...
- Check out some interesting work done by Luis Gravano on structured search and Information Extraction from the "hidden-web": http://www.cs.columbia.edu/~grav...
- Since the costs of Random-Access Memory are continually decreasing the topic of Main-memory Databases will probably gain more and more attention (see http://en.wikipedia.org/wiki/In-...). Check out Memcached, Hazelcast, Membase , MemSQL, FastDB, SciDB and RAMCloud: http://fiz.stanford.edu:8081/display/ramcloud/Home
- Druid: A Distributed, In-Memory OLAP Store: http://metamarketsgroup.com/blog...
- Google Snappy, http://code.google.com/p/snappy/, a compression/decompression library used in BigTable.
- Heroku Doozer: http://xph.us/2011/04/13/introdu... and http://blog.golang.org/2011/04/g...
- LevelDB - a fast and lightweight key/value database library: http://code.google.com/p/leveldb/
- Yet Another SQL-to-MapReduce Translator: http://www.cse.ohio-state.edu/hp...
- The proliferation of 10 Gigabit Ethernet, Infiniband/RDMA and other High Performance Computing technologies into mainstream may require rethinking of some basic assumptions in database design (See When will 10 gigabit Ethernet overtake 1 gigabit Ethernet in deployment? and It's time for low latency: http://www.matt-welsh.blogspot.c... )
- RethinkDB (http://www.rethinkdb.com/blog/) are doing some interesting work on Solid-State Drives based databases, you may want to check it out
- I also think a few integrated specialized products are either missing, or over-engineered, unscalable, and/or expensive: 1) a tightly integrated db + analytics engine, e.g. for EEG or financial time series 2) a tightly integrated messaging framework + db optimized for really fast ETL
- As a general piece of advice I'd try to avoid over-specialized topics in favor of building a database system for a certain real-world domain (e.g see How do scientists share data and code?). Also see Patterson, "How to Have a Bad Career in Academia": http://www.cs.berkeley.edu/~patt...
- Related: What is the best literature on the design of database platforms? Why?
Comments
Post a Comment