Neville Franks | 16 Nov 04:01 2010

Bulk Updates in CouchDB

Hi,
I am just learning about CouchDB so please excuse this nooby question.

I've read lots the past few days including
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API and a fair bit
of two of the online CouchDB books.

My question is how do I do some simple things like:

1) Delete all documents where key.value = xxx

2) Update all documents where key.value = xxx so value = yyy

I want the DB to do these, not for me to have to iterate through the
DB in code. From what I've read about Views, they are read-only and
therefore can't be used in update/delete operations.

I've read lots on views and CouchDB seems great at getting information
out in all sorts of ways, however basic bulk update/delete operations
are so far alluding me.

My main exposure to DB's at this time is using SQLite and these sorts
of things are of course easy and quick to do in SQL.

Hopefully I'm missing something obvious.

---
Neville Franks,  http://www.surfulater.com

(Continue reading)

kowsik | 16 Nov 04:06 2010
Picon

Re: Bulk Updates in CouchDB

You can't do this at the moment. At least, not that I know of.

For #1, my current trick is to generate a view with a map function
that looks like this:

map: function(doc) {
    emit(null, doc._rev);
}

This makes it easy to convert the results of the view (without
stale=ok) into a bulk-delete by iterating over the view, but not
fetching each document to get at the revision. This does mean that the
documents haven't been updated between the view-fetch and the
bulk-delete. But for my use-case, this works.

For #2, you have to iterate over from the client side. Unless, someone
else has another idea.

K.
---
http://www.pcapr.net
http://twitter.com/pcapr
http://labs.mudynamics.com

On Mon, Nov 15, 2010 at 7:01 PM, Neville Franks <subs@...> wrote:
> Hi,
> I am just learning about CouchDB so please excuse this nooby question.
>
> I've read lots the past few days including
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API and a fair bit
(Continue reading)

Neville Franks | 16 Nov 10:44 2010

Re: Bulk Updates in CouchDB

Thanks for the prompt response. I have to say that I am very, very
surprised that what seems to me are such basic operations aren't
available natively within CouchDB.

This is probably a deal breaker for my use and I would have thought
many others. My concern is iterating over a large number of documents
on a remote server just to do simple updates. It means I need to do
several HTTP requests (GET/PUT/DELETE) for each document in a set of
of possibly thousands or tens of thousands. I'm in Australia and the
server is in the US and I would imagine this making an application
unusable.

I am getting the feeling that CouchDB is great for storing lots of
information and getting it back in lots of interesting ways but not a
good fit for typical CRUD stuff that's done in SQL all the time.
Please correct me if I'm wrong.

Tuesday, November 16, 2010, 2:06:18 PM, you wrote:

k> You can't do this at the moment. At least, not that I know of.

k> For #1, my current trick is to generate a view with a map function
k> that looks like this:

k> map: function(doc) {
k>     emit(null, doc._rev);
k> }

k> This makes it easy to convert the results of the view (without
k> stale=ok) into a bulk-delete by iterating over the view, but not
(Continue reading)

Sebastian Cohnen | 16 Nov 11:05 2010

Re: Bulk Updates in CouchDB

You can update documents in batches. but in order to update them - deletion is also an update in that case - you
need to provide the document, or, in case of deletion, at least the correct _rev token. so you would need to
collect the _rev tokens and IDs of the docs you want to delete, batch them up and send one request to the server.

On 16.11.2010, at 10:44, Neville Franks wrote:

> Thanks for the prompt response. I have to say that I am very, very
> surprised that what seems to me are such basic operations aren't
> available natively within CouchDB.
> 
> This is probably a deal breaker for my use and I would have thought
> many others. My concern is iterating over a large number of documents
> on a remote server just to do simple updates. It means I need to do
> several HTTP requests (GET/PUT/DELETE) for each document in a set of
> of possibly thousands or tens of thousands. I'm in Australia and the
> server is in the US and I would imagine this making an application
> unusable.
> 
> I am getting the feeling that CouchDB is great for storing lots of
> information and getting it back in lots of interesting ways but not a
> good fit for typical CRUD stuff that's done in SQL all the time.
> Please correct me if I'm wrong.
> 
> 
> Tuesday, November 16, 2010, 2:06:18 PM, you wrote:
> 
> k> You can't do this at the moment. At least, not that I know of.
> 
> k> For #1, my current trick is to generate a view with a map function
> k> that looks like this:
(Continue reading)

Jan Lehnardt | 16 Nov 11:48 2010
Picon

Re: Bulk Updates in CouchDB

Hi Neville,

On 16 Nov 2010, at 10:44, Neville Franks wrote:

> Thanks for the prompt response. I have to say that I am very, very
> surprised that what seems to me are such basic operations aren't
> available natively within CouchDB.

It is less that this is a basic operation that isn't supported and
more shows the difference in philosophy between CouchDB and, say,
SQLite.

> This is probably a deal breaker for my use and I would have thought
> many others. My concern is iterating over a large number of documents
> on a remote server just to do simple updates. It means I need to do
> several HTTP requests (GET/PUT/DELETE) for each document in a set of
> of possibly thousands or tens of thousands. I'm in Australia and the
> server is in the US and I would imagine this making an application
> unusable.

A couple of thoughts:

 - How often does that run? — Of course, the operation will be slower
   than telling the server to update a bunch of fields*, but if it is
   rare occurrence, it may not be that big a deal.

    * CouchDB doesn't have a notion of "fields", hence this operation
      proves a little tricky.

 - CouchDB could handle the bulk updating for you, but it'd essentially
(Continue reading)

Jan Lehnardt | 16 Nov 11:52 2010
Picon

Re: Bulk Updates in CouchDB


On 16 Nov 2010, at 11:48, Jan Lehnardt wrote:

> Hi Neville,
> 
> On 16 Nov 2010, at 10:44, Neville Franks wrote:
> 
>> Thanks for the prompt response. I have to say that I am very, very
>> surprised that what seems to me are such basic operations aren't
>> available natively within CouchDB.
> 
> It is less that this is a basic operation that isn't supported and
> more shows the difference in philosophy between CouchDB and, say,
> SQLite.
> 
> 
>> This is probably a deal breaker for my use and I would have thought
>> many others. My concern is iterating over a large number of documents
>> on a remote server just to do simple updates. It means I need to do
>> several HTTP requests (GET/PUT/DELETE) for each document in a set of
>> of possibly thousands or tens of thousands. I'm in Australia and the
>> server is in the US and I would imagine this making an application
>> unusable.
> 
> A couple of thoughts:
> 
> - How often does that run? — Of course, the operation will be slower
>   than telling the server to update a bunch of fields*, but if it is
>   rare occurrence, it may not be that big a deal.
> 
(Continue reading)

Neville Franks | 16 Nov 23:02 2010

Re: Bulk Updates in CouchDB

Hi Jan,
Thanks for taking the time to respond in detail. I imagine most people
coming for SQL'land will face various brick walls while trying to
learn the new paradigm's of Document Oriented DB's.

I think it is time for me to stop reading and dig my heels in with a
"proof of concept" sample app. No doubt this will be challenging,
however I'm sure I'll learn a lot. Hopefully the batch update methods
you discuss will be satisfactory both from a coding and performance
perspective.

I'm heartened to know that someone else feels that having bulk
editing on the server is a great idea and not some newby stupid
comment on my part.

My overriding interest in CouchDB is its replication capabilities and
offline/online use case. I have not found any other database that does
this so easily and hopefully effectively as CouchDB. My plan was to
implement my own replication capability using SQLite, which I already
use, however this is a complex task, one which I'll happily leave to
others.

I'm sure more questions will follow. The SQLite community is very
active and helpful, and from what I've seen, so is CouchDB.

Tuesday, November 16, 2010, 9:48:21 PM, you wrote:

JL> Hi Neville,

JL> On 16 Nov 2010, at 10:44, Neville Franks wrote:
(Continue reading)

Jan Lehnardt | 17 Nov 10:56 2010
Picon

Re: Bulk Updates in CouchDB

Hi Neville,

The CouchDB Book has a chapter on solving common tasks in CouchDB if you
have an RDBMS background:

  http://guide.couchdb.org/editions/1/en/cookbook.html

It doesn't cover your case, but I intend to add it.

Cheers
Jan
--

-- 

On 16 Nov 2010, at 23:02, Neville Franks wrote:

> Hi Jan,
> Thanks for taking the time to respond in detail. I imagine most people
> coming for SQL'land will face various brick walls while trying to
> learn the new paradigm's of Document Oriented DB's.
> 
> I think it is time for me to stop reading and dig my heels in with a
> "proof of concept" sample app. No doubt this will be challenging,
> however I'm sure I'll learn a lot. Hopefully the batch update methods
> you discuss will be satisfactory both from a coding and performance
> perspective.
> 
> I'm heartened to know that someone else feels that having bulk
> editing on the server is a great idea and not some newby stupid
> comment on my part.
> 
(Continue reading)

Neville Franks | 17 Nov 20:25 2010

Re: Bulk Updates in CouchDB

Hi Jan,
Thanks, very useful, however it has a focus on retrieval and doesn't
discuss update/delete operations. I've found that to largely be the
case with CouchDB articles etc. ie. They discuss retrieval using views
(map/reduce) and inserts but not update/delete so much and not in
terms of the bulk SQL operations I raised initially.

I'd certainly welcome more on this.

Re. "CouchDB The Definitive Guide" - is that actually available in
print yet and does/will it have more content than the on-line
version?

Wednesday, November 17, 2010, 8:56:38 PM, you wrote:

JL> Hi Neville,

JL> The CouchDB Book has a chapter on solving common tasks in CouchDB if you
JL> have an RDBMS background:

JL>   http://guide.couchdb.org/editions/1/en/cookbook.html

JL> It doesn't cover your case, but I intend to add it.

JL> Cheers
JL> Jan

--
Best regards,
  Neville Franks, http://www.surfulater.com http://blog.surfulater.com
(Continue reading)

Jan Lehnardt | 17 Nov 21:46 2010
Picon

Re: Bulk Updates in CouchDB


On 17 Nov 2010, at 20:25, Neville Franks wrote:

> Hi Jan,
> Thanks, very useful, however it has a focus on retrieval and doesn't
> discuss update/delete operations. I've found that to largely be the
> case with CouchDB articles etc. ie. They discuss retrieval using views
> (map/reduce) and inserts but not update/delete so much and not in
> terms of the bulk SQL operations I raised initially.
> 
> I'd certainly welcome more on this.
> 
> Re. "CouchDB The Definitive Guide" - is that actually available in
> print yet and does/will it have more content than the on-line
> version?

It has been in print since February, see http://guide.couchdb.org/

It will always have the same content. This is an open source book.

Cheers
Jan
--

-- 

> 
> 
> Wednesday, November 17, 2010, 8:56:38 PM, you wrote:
> 
> JL> Hi Neville,
> 
(Continue reading)

Neville Franks | 17 Nov 21:53 2010

Re: Bulk Updates in CouchDB

Thursday, November 18, 2010, 7:46:11 AM, you wrote:

JL> On 17 Nov 2010, at 20:25, Neville Franks wrote:

>> Hi Jan,
>> Thanks, very useful, however it has a focus on retrieval and doesn't
>> discuss update/delete operations. I've found that to largely be the
>> case with CouchDB articles etc. ie. They discuss retrieval using views
>> (map/reduce) and inserts but not update/delete so much and not in
>> terms of the bulk SQL operations I raised initially.
>> 
>> I'd certainly welcome more on this.
>> 
>> Re. "CouchDB The Definitive Guide" - is that actually available in
>> print yet and does/will it have more content than the on-line
>> version?

JL> It has been in print since February, see http://guide.couchdb.org/

JL> It will always have the same content. This is an open source book.

I thought so, but wasn't 100% sure. Are you continuing to add / update
content to the on-line version which will then appear in a future
print version update?

Do you still want tickets submitted for book corrections etc.

--
Best regards,
  Neville Franks, http://www.surfulater.com http://blog.surfulater.com
(Continue reading)

Jan Lehnardt | 17 Nov 22:18 2010
Picon

Re: Bulk Updates in CouchDB


On 17 Nov 2010, at 21:53, Neville Franks wrote:

> Thursday, November 18, 2010, 7:46:11 AM, you wrote:
> 
> JL> On 17 Nov 2010, at 20:25, Neville Franks wrote:
> 
>>> Hi Jan,
>>> Thanks, very useful, however it has a focus on retrieval and doesn't
>>> discuss update/delete operations. I've found that to largely be the
>>> case with CouchDB articles etc. ie. They discuss retrieval using views
>>> (map/reduce) and inserts but not update/delete so much and not in
>>> terms of the bulk SQL operations I raised initially.
>>> 
>>> I'd certainly welcome more on this.
>>> 
>>> Re. "CouchDB The Definitive Guide" - is that actually available in
>>> print yet and does/will it have more content than the on-line
>>> version?
> 
> JL> It has been in print since February, see http://guide.couchdb.org/
> 
> JL> It will always have the same content. This is an open source book.
> 
> I thought so, but wasn't 100% sure. Are you continuing to add / update
> content to the on-line version which will then appear in a future
> print version update?

Yes, we're working on the second edition* with the open source community.

(Continue reading)

Karel Minařík | 16 Nov 14:59 2010
Picon

Re: Bulk Updates in CouchDB

> This is probably a deal breaker for my use and I would have thought
> many others. My concern is iterating over a large number of documents
> on a remote server just to do simple updates. It means I need to do
> several HTTP requests (GET/PUT/DELETE) for each document in a set of
> of possibly thousands or tens of thousands.

I don't think that's the case, if I understand your situation. You  
want to delete or update multiple documents based on some criteria.  
You retrieve your docs IDs + revision IDs via map/reduce view or  
fulltext query, and then issue a bulk request
[http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request 
]. This bulk request could easily delete/modify tens of thousands of  
documents, depending on your hardware. It could (and possibly should)  
run in background, via scheduling process (eg. Resque).

> I'm in Australia and the server is in the US (...)

As Jan pointed out, more appropriate could be to do those updates  
locally and replicate.

> I am getting the feeling that CouchDB is great for storing lots of
> information and getting it back in lots of interesting ways but not a
> good fit for typical CRUD stuff that's done in SQL all the time.

I would not put it this way, based on my experience (and many  
grievances with Couch and Ruby). The "typical CRUD" operations in the  
sense of "let's make a blog in 15 minutes" are 100% supported by  
Couch. Depending on the interface for your programming language,  
there's little difference to using, let's say, SQLite.

(Continue reading)

Karel Minařík | 16 Nov 14:37 2010
Picon

Re: Bulk Updates in CouchDB

>> My question is how do I do some simple things like:
>> 1) Delete all documents where key.value = xxx
>
> For #1, my current trick is to generate a view with a map function
> that looks like this:
>
> map: function(doc) {
>    emit(null, doc._rev);
> }
>
> This makes it easy to convert the results of the view (without
> stale=ok) into a bulk-delete by iterating over the view, but not
> fetching each document to get at the revision.

Hmm, could you explain a bit how it should work? (Because the way I  
understand it, you have to iterate over _all_ the docs and check `if  
doc.value == xxx` in the client code? That'd be impossible for large  
data sets.)

As for the question 1), I'd either do a view to emit `[doc.value, 1]`  
or index it with fulltext, then get the results back (either querying  
with `key="xxx"` or `?q=key:xxx`) in batches of ~10.000, get the IDs  
and revisions and then issue a bulk delete.

> 2) Update all documents where key.value = xxx so value = yyy

There is no way to do 2), AFAIK. This is not entirely bad, neccessarily.

Karel

(Continue reading)

Mark J. Reed | 16 Nov 15:17 2010
Picon

Re: Bulk Updates in CouchDB

On Tue, Nov 16, 2010 at 8:37 AM, Karel Minařík <karel.minarik <at> gmail.com> wrote:
> Hmm, could you explain a bit how it should work? (Because the way I
> understand it, you have to iterate over _all_ the docs and check `if
> doc.value == xxx` in the client code? That'd be impossible for large data
> sets.)

Given /mydb/_design/mydd with the following (I'm not bothering with
quotation marks, so this isn't copy/pasteable):

views: { foo: { map: function(doc) { emit(doc.value, doc._rev) } } }

Then you fetch this:

/mydb/_design/mydd/_view/foo?key="xxx"

Which will include the doc id's and revisions of all the matching
docs. Then you use that result to build up a request like this:

{ docs:  [ { _id: someid, _rev: itsrev, _deleted: true }, { _id:
someotherid, ...} ] }

Which you then post to /mydb/_bulk_docs, and then all the matching
docs are deleted, with only two HTTP round-trips.

For selective updating, you can use an update handler to modify a
document "in-place" without first fetching and then storing the whole
thing, but then you can't do that in bulk.

--

-- 
Mark J. Reed <markjreed@...>
(Continue reading)

Karel Minařík | 16 Nov 15:35 2010
Picon

Re: Bulk Updates in CouchDB

>> Hmm, could you explain a bit how it should work? (Because the way I
>> understand it, you have to iterate over _all_ the docs and check `if
>> doc.value == xxx` in the client code? That'd be impossible for  
>> large data
>> sets.)
>
> Given /mydb/_design/mydd with the following
> views: { foo: { map: function(doc) { emit(doc.value, doc._rev) } } }
>
> Then you fetch this:
>
> /mydb/_design/mydd/_view/foo?key="xxx"

Sure, but only for the doc attribute specified in `foo`. Then you'd  
have to define views for all the attributes you want to query this  
way, which is "impossible" for large and/or fast growing data sets. Or  
what am I missing?

Karel

Mark J. Reed | 16 Nov 18:13 2010
Picon

Re: Bulk Updates in CouchDB

On Tue, Nov 16, 2010 at 9:35 AM, Karel Minařík <karel.minarik <at> gmail.com> wrote:
> Sure, but only for the doc attribute specified in `foo`. Then you'd have to
> define views for all the attributes you want to query this way, which is
> "impossible" for large and/or fast growing data sets. Or what am I missing?

You could use a temporary view, where the query specifies the view
instead of having it predefined, but then you take a serious
performance hit as it populates the view at query-time instead of in
advance.

CouchDB is optimized for applications where you know ahead of time
exactly what your queries will look like, and in such cases it
executes those queries blindingly fast.  But it's not really designed
for ad-hoc querying.  If you want that sort of flexibility, you may
want to:

1) look at adding couchdb-solr to your deployment

2) use a different data store, either

  a) a more flexible NoSQL solution, like MongoDB, which has very
powerful ad-hoc query and update capabilities, or even

  b) an RDBMS, reports of whose death have been somewhat exaggerated
by the NoSQL community..

--

-- 
Mark J. Reed <markjreed@...>

(Continue reading)

Neville Franks | 16 Nov 23:15 2010

Re: Bulk Updates in CouchDB

Hi Mark,
Thanks for your reply. My queries and update/delete operations are all
well defined, so there is no ad-hoc query concerns.

As I just replied to Jan Lehnardt my overriding interest in CouchDB is
its replication capabilities and offline/online use case. I have not
found any other database that does this so easily and hopefully
effectively as CouchDB.

I have looked at MongoDB, and it does have more of the update/delete
etc. capabilities I'm used to it, however it doesn't have the
replication that CouchDB does, instead it has master/slave and
read-only on the slaves.

Wednesday, November 17, 2010, 4:13:32 AM, you wrote:

MJR> On Tue, Nov 16, 2010 at 9:35 AM, Karel Minařík
MJR> <karel.minarik@...> wrote:
>> Sure, but only for the doc attribute specified in `foo`. Then you'd have to
>> define views for all the attributes you want to query this way, which is
>> "impossible" for large and/or fast growing data sets. Or what am I missing?

MJR> You could use a temporary view, where the query specifies the view
MJR> instead of having it predefined, but then you take a serious
MJR> performance hit as it populates the view at query-time instead of in
MJR> advance.

MJR> CouchDB is optimized for applications where you know ahead of time
MJR> exactly what your queries will look like, and in such cases it
MJR> executes those queries blindingly fast.  But it's not really designed
(Continue reading)

Mark J. Reed | 18 Nov 20:17 2010
Picon

Re: Bulk Updates in CouchDB

On Tue, Nov 16, 2010 at 5:15 PM, Neville Franks <subs@...> wrote:
> Thanks for your reply. My queries and update/delete operations are all
> well defined, so there is no ad-hoc query concerns.

Ah, OK.  I inferred that there might be such concerns based on your
objection to creating a view for each attribute you want to query. If
you know ahead of time what those attributes are, there's no reason
you can't precreate all those views.  If it's a large number, it's not
too hard to automate the view creation...

> I have looked at MongoDB, and it does have more of the update/delete
> etc. capabilities I'm used to it, however it doesn't have the
> replication that CouchDB does, instead it has master/slave and
> read-only on the slaves.

That has been true, but I'm given to understand that master/master is
either in the latest release or coming soon.

--

-- 
Mark J. Reed <markjreed@...>


Gmane