7 Jun 2012 16:26

## Second largest component

Hi all,

I continue with the probably silly questions..

I have a graph that is relatively large (1.6 million vertices, 2.4 million arcs). I extract the largest strong component, but it is still quite large (~400,000 vertices) and while the "analysis" algos work smoothly, when it comes to representing it, it's very sparsely connected, so essentially a huge cloud of points.

So I am trying to extract the second largest strong component, which is way smaller. I do the following:

comp, hist = label_components(g)
idx = argsort(hist)
idx_second_largest = idx[-2]
idx_vertices_second_largest = find(comp.a == idx_second_largest)
u = GraphView(g, vfilt=idx_vertices_second_largest)

To make things easier, I'll paraphrase the example from the doc.
>>> g2=random_graph(100, lambda: (1,1))
>>> comp,hist=label_components(g2)
>>> print hist
[11 28 41 17  3]

So my second largest strong component should be the one with elements labelled 1.

>>> print comp.a
[0 1 1 2 3 2 2 0 0 2 1 3 3 0 3 1 4 0 2 2 1 3 2 1 2 2 2 2 2 1 2 1 3 1 2 2 1
2 3 2 2 0 2 2 4 2 3 2 2 3 1 1 2 1 1 2 1 1 2 4 2 1 0 1 2 1 1 3 2 2 2 1 3 3
1 2 1 1 0 0 2 2 2 3 3 3 2 2 2 2 1 2 3 0 1 1 2 1 0 3]

Using the code above gives, finally

>>> print idx_vertices_second_largest
[ 1  2 10 15 20 23 29 31 33 36 50 51 53 54 56 57 61 63 65 66 71 74 76 77 90
94 95 97]

But when I use this as u = GraphView(g2, vfilt=idx_vertices_second_largest), I get an error that ends with

File "/usr/local/lib/python2.7/dist-packages/graph_tool/__init__.py", line 527, in __get_set_f_array
a[:] = v
ValueError: operands could not be broadcast together with shapes (100) (4)

So as far as picking up the indices, I seem to be right, but after that, problem.. Where am I being silly? (Coming from classic C, my first reaction is "what should I typecast to?".)

Thanks!
--

J.

_______________________________________________
graph-tool mailing list
graph-tool <at> skewed.de
http://lists.skewed.de/mailman/listinfo/graph-tool
7 Jun 2012 16:44

### Re: Second largest component

On 06/07/2012 04:26 PM, Julien Arino wrote:
> So I am trying to extract the second largest strong component, which is way smaller. I do the following:
>
> comp, hist = label_components(g)
> idx = argsort(hist)
> idx_second_largest = idx[-2]
> idx_vertices_second_largest = find(comp.a == idx_second_largest)
> u = GraphView(g, vfilt=idx_vertices_second_largest)
>
> To make things easier, I'll paraphrase the example from the doc.
>>>> g2=random_graph(100, lambda: (1,1))
>>>> comp,hist=label_components(g2)
>>>> print hist
> [11 28 41 17  3]
>
> So my second largest strong component should be the one with elements labelled 1.
>
>>>> print comp.a
> [0 1 1 2 3 2 2 0 0 2 1 3 3 0 3 1 4 0 2 2 1 3 2 1 2 2 2 2 2 1 2 1 3 1 2 2 1
>  2 3 2 2 0 2 2 4 2 3 2 2 3 1 1 2 1 1 2 1 1 2 4 2 1 0 1 2 1 1 3 2 2 2 1 3 3
>  1 2 1 1 0 0 2 2 2 3 3 3 2 2 2 2 1 2 3 0 1 1 2 1 0 3]
>
> Using the code above gives, finally
>
>>>> print idx_vertices_second_largest
> [ 1  2 10 15 20 23 29 31 33 36 50 51 53 54 56 57 61 63 65 66 71 74 76 77 90 <tel:71%2074%2076%2077%2090>
>  94 95 97]
>
> But when I use this as u = GraphView(g2, vfilt=idx_vertices_second_largest), I get an error that ends with
>
> File "/usr/local/lib/python2.7/dist-packages/graph_tool/__init__.py", line 527, in __get_set_f_array
>     a[:] = v
> ValueError: operands could not be broadcast together with shapes (100) (4)
>
> So as far as picking up the indices, I seem to be right, but after that, problem.. Where am I being silly?
(Coming from classic C, my first reaction is "what should I typecast to?".)

You're not passing the correct information as the vertex filter. The
vfilt parameter from GraphView expects either a function or a vertex
property map or an array. The vertex property map must be of type
"boolean", where the vertices with value 'True' are kept in the
graph. Similarly, if you pass an array, it should also be boolean (or
integer) valued, and it should have the same size as the number of
vertices. You cannot pass a list of vertices you want to be kept...
However, if you pass a function, it will be called on each vertex, and
the result determines if the vertex is filtered. Thus, you have
essentially two options:

u = GraphView(g2, vfilt=lambda v: v in idx_vertices_second_largest)

or

comp, hist = label_components(g)
idx = argsort(hist)
idx = idx[-2]
filter = comp.a == idx
u = GraphView(g2, vfilt=filter)

The second option should be the fastest.

Cheers,
Tiago

--

--
Tiago de Paula Peixoto <tiago <at> skewed.de>

_______________________________________________
graph-tool mailing list
graph-tool <at> skewed.de
http://lists.skewed.de/mailman/listinfo/graph-tool

Gmane