Luke Miner | 15 Mar 2012 16:05
Picon

Converting XML to .csv format

I am trying to convert the bulk whois data into csv format so that I can use it in a statistical package.


For each IP address I'd like to get the list of associated POCs plus the date of assignment. Something like:
-startAddress, endAddress, orgHandle1, registrationDate1, orgHandle2, registrationDate2, etc.

Right now I'm importing the NET XML Elements data into Microsoft Access, and exporting the net table as a csv file. Unfortunately, there are errors during the import, so, for example, the POC field is empty. Also the CSV file that I'm left with is disaggregated at the network level, with overlapping ranges. It's not clear how to get from this to data providing all the networks associated with a given block of IP addresses.

Any help would be most appreciated,

Thanks,
Luke
<div>
<p>I am trying to convert the bulk whois data into csv format so that I can use it in a statistical package.</p>
<div><br></div>
<div>For each IP address I'd like to get the list of associated POCs plus the date of assignment. Something like:<br><div>-startAddress, endAddress, orgHandle1, registrationDate1, orgHandle2, registrationDate2, etc.</div>
</div>
<div><br></div>
<div>Right now I'm importing the NET XML Elements data into Microsoft Access, and exporting the net table as a csv file. Unfortunately, there are errors during the import, so, for example, the POC field is empty. Also the CSV file that I'm left with is&nbsp;disaggregated&nbsp;at the network level, with overlapping ranges. It's not clear how to get from this to data providing all the networks associated with a given block of IP addresses.</div>

<div><br></div>
<div>Any help would be most appreciated,</div>
<div><br></div>
<div>Thanks,</div>
<div>Luke</div>
</div>
David Huberman | 15 Mar 2012 16:42

Re: Converting XML to .csv format

Hello Luke,

1) Working with NETS:

I believe you want to look for the parentNetHandle data as the key to
connect the dots. For example, there are three NETs in our database that
contain the IP address 24.0.0.0:

24.0.0.0/8 for OrgID: ARIN.  It has netHandle NET-24-0-0-0-0
24.0.0.0/12 for OrgID: CMCS. It has netHandle NET-24-0-0-0-1 and is
parented by NET-24-0-0-0-0.
24.0.0.0/16 for Customer Org: C01508994. It has a netHandle NET-24-0-0-0-2
and is parented by NET-24-0-0-0-1.

NB: the very top of the hierarchy (the blocks issued by IANA to ARIN for
issuance to our customers, like /8s in IPv4 and /12s and /23s in IPv6)
won't have parentNetHandles, so you may also want to account for that.

2) Associating POCs with NETs:

Writing about NETs, you've indicated: "I'd like to get the list of
associated POCs". I think there are two concepts that you need to know to
accomplish this goal.

- NETs can have POCs. We call them "resource POCs".  A NET can have three
kinds of resources POCs: a TechPOC, an AbusePOC, and/or a NOC POC. But all
these are optional.  In most cases, a NET will not have any resource POC.
So you should see a lot of blank POC fields.

- NETs can be registered to either a customer org handle (which look like
C01508994), or a "real" OrgID (like OrgID: CMCS for Comcast or OrgID:
CITYO-121 for the City of Rockford). The real OrgIDs all have one admin
POC, and at least one tech POC and abuse POC.  They may also have at least
one optional NOC POC.  In contrast, customer orgs do not have POCs at all.

In the easy case, a NET is registered to an OrgID, and we consider the
OrgID's POCs to be the POCs for their NETs.

For example, 216.64.0.0/17 is registered to OrgID: PAET.  The /17 has no
resource POCs.  OrgID: PAET has an admin POC, a tech POC, and an abuse
POC.  By reference, therefore, you can say that 216.64.0.0/17 has the
following associated POCs:
- IP43-ARIN (the admin and tech POC for OrgID: PAET); and
- ABUSE741-ARIN (the abuse POC for OrgID: PAET).

In the harder case, a NET is registered to a customer Org record.  To
associate any POCs with it, you have to go up a level of hierarchy to the
parentNetHandle, identify the OrgID the parentNetHandle is registered to,
and identify its POCs.

For example, 24.0.0.0/16 is registered to customer org record C01508994.
To compile a list of POCs associated with 24.0.0.0/16, you'd go to the
parentNetHandle, NET-24-0-0-0-1, cross over to the OrgID: CMCS, and find
its POCs (NAPO-ARIN and IC161-ARIN).

So that's how ARIN views it, anyway. It's how we construct Whois query
results, in hopes of providing the most useful data possible.

I hope all that helps!

Regards,
David

---
David R Huberman
Principal Technical Analyst, ARIN
703-227-9866

On 3/15/12 11:05 AM, "Luke Miner" <lminer@...> wrote:

I am trying to convert the bulk whois data into csv format so that I can
use it in a statistical package.
For each IP address I'd like to get the list of associated POCs plus the
date of assignment. Something like:
-startAddress, endAddress, orgHandle1, registrationDate1, orgHandle2,
registrationDate2, etc.

Right now I'm importing the NET XML Elements data into Microsoft Access,
and exporting the net table as a csv file. Unfortunately, there are errors
during the import, so, for example, the POC field is empty. Also the CSV
file that I'm left with is disaggregated at the network level, with
overlapping ranges. It's not clear how to get from this to data providing
all the networks associated with a given block of IP addresses.

Any help would be most appreciated,

Thanks,
Luke
_______________________________________________
arin-tech-discuss mailing list
arin-tech-discuss@...
http://lists.arin.net/mailman/listinfo/arin-tech-discuss

Luke Miner | 15 Mar 2012 17:08
Picon

Re: Converting XML to .csv format

Hi David,


Thank you very much for this; it really helps to clear things up! 

I think what I'm looking for is orgID for the parent net handle. 

So just to be clear with the example below, ARIN is the top level network just in the sense that it is in charge of administering this IP block. A subset of these IPs have been assigned to Comcast, and within comcast's network, a subset is assigned to Comcast Customer Org?

If whois says that 24.0.0.0/8 has been allocated to ARIN in 2001, but Comcast doesn't register 24.0.0.0/12 until 2003, does this mean that 24.0.0.0/12 probably wasn't in use at least until the date it was assigned to Comcast?

Thanks
Luke

On Thu, Mar 15, 2012 at 3:42 PM, David Huberman <dhuberma-VVaJSnPt7Kc@public.gmane.org> wrote:
Hello Luke,

1) Working with NETS:

I believe you want to look for the parentNetHandle data as the key to
connect the dots. For example, there are three NETs in our database that
contain the IP address 24.0.0.0:

24.0.0.0/8 for OrgID: ARIN.  It has netHandle NET-24-0-0-0-0
24.0.0.0/12 for OrgID: CMCS. It has netHandle NET-24-0-0-0-1 and is
parented by NET-24-0-0-0-0.
24.0.0.0/16 for Customer Org: C01508994. It has a netHandle NET-24-0-0-0-2
and is parented by NET-24-0-0-0-1.

NB: the very top of the hierarchy (the blocks issued by IANA to ARIN for
issuance to our customers, like /8s in IPv4 and /12s and /23s in IPv6)
won't have parentNetHandles, so you may also want to account for that.

2) Associating POCs with NETs:

Writing about NETs, you've indicated: "I'd like to get the list of
associated POCs". I think there are two concepts that you need to know to
accomplish this goal.

- NETs can have POCs. We call them "resource POCs".  A NET can have three
kinds of resources POCs: a TechPOC, an AbusePOC, and/or a NOC POC. But all
these are optional.  In most cases, a NET will not have any resource POC.
So you should see a lot of blank POC fields.

- NETs can be registered to either a customer org handle (which look like
C01508994), or a "real" OrgID (like OrgID: CMCS for Comcast or OrgID:
CITYO-121 for the City of Rockford). The real OrgIDs all have one admin
POC, and at least one tech POC and abuse POC.  They may also have at least
one optional NOC POC.  In contrast, customer orgs do not have POCs at all.

In the easy case, a NET is registered to an OrgID, and we consider the
OrgID's POCs to be the POCs for their NETs.

For example, 216.64.0.0/17 is registered to OrgID: PAET.  The /17 has no
resource POCs.  OrgID: PAET has an admin POC, a tech POC, and an abuse
POC.  By reference, therefore, you can say that 216.64.0.0/17 has the
following associated POCs:
- IP43-ARIN (the admin and tech POC for OrgID: PAET); and
- ABUSE741-ARIN (the abuse POC for OrgID: PAET).

In the harder case, a NET is registered to a customer Org record.  To
associate any POCs with it, you have to go up a level of hierarchy to the
parentNetHandle, identify the OrgID the parentNetHandle is registered to,
and identify its POCs.

For example, 24.0.0.0/16 is registered to customer org record C01508994.
To compile a list of POCs associated with 24.0.0.0/16, you'd go to the
parentNetHandle, NET-24-0-0-0-1, cross over to the OrgID: CMCS, and find
its POCs (NAPO-ARIN and IC161-ARIN).

So that's how ARIN views it, anyway. It's how we construct Whois query
results, in hopes of providing the most useful data possible.

I hope all that helps!

Regards,
David

---
David R Huberman
Principal Technical Analyst, ARIN
703-227-9866







On 3/15/12 11:05 AM, "Luke Miner" <lminer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

I am trying to convert the bulk whois data into csv format so that I can
use it in a statistical package.
For each IP address I'd like to get the list of associated POCs plus the
date of assignment. Something like:
-startAddress, endAddress, orgHandle1, registrationDate1, orgHandle2,
registrationDate2, etc.


Right now I'm importing the NET XML Elements data into Microsoft Access,
and exporting the net table as a csv file. Unfortunately, there are errors
during the import, so, for example, the POC field is empty. Also the CSV
file that I'm left with is disaggregated at the network level, with
overlapping ranges. It's not clear how to get from this to data providing
all the networks associated with a given block of IP addresses.

Any help would be most appreciated,

Thanks,
Luke
_______________________________________________
arin-tech-discuss mailing list
arin-tech-discuss-VVaJSnPt7Kc@public.gmane.org
http://lists.arin.net/mailman/listinfo/arin-tech-discuss


<div>
<p>Hi David,</p>
<div><br></div>
<div>Thank you very much for this; it really helps to clear things up!&nbsp;</div>
<div><br></div>
<div>I think what I'm looking for is orgID for the parent net handle.&nbsp;</div>
<div><br></div>
<div>So just to be clear with the example below, ARIN is the top level network just in the sense that it is in charge of administering this IP block. A subset of these IPs have been assigned to Comcast, and within comcast's network, a subset is assigned to Comcast Customer Org?</div>

<div><br></div>
<div>If whois says that&nbsp;<a href="http://24.0.0.0/8">24.0.0.0/8</a>&nbsp;has been allocated to ARIN in 2001, but Comcast doesn't register <a href="http://24.0.0.0/12">24.0.0.0/12</a> until 2003, does this mean that&nbsp;<a href="http://24.0.0.0/12">24.0.0.0/12</a> probably wasn't in use at least until the date it was assigned to Comcast?</div>

<div><br></div>
<div>Thanks</div>
<div>Luke</div>
<div>
<br><div class="gmail_quote">On Thu, Mar 15, 2012 at 3:42 PM, David Huberman <span dir="ltr">&lt;<a href="mailto:dhuberma@...">dhuberma@...</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">Hello Luke,<br><br>
1) Working with NETS:<br><br>
I believe you want to look for the parentNetHandle data as the key to<br>
connect the dots. For example, there are three NETs in our database that<br>
contain the IP address <a href="http://24.0.0.0" target="_blank">24.0.0.0</a>:<br><br><a href="http://24.0.0.0/8" target="_blank">24.0.0.0/8</a> for OrgID: ARIN. &nbsp;It has netHandle NET-24-0-0-0-0<br><a href="http://24.0.0.0/12" target="_blank">24.0.0.0/12</a> for OrgID: CMCS. It has netHandle NET-24-0-0-0-1 and is<br>
parented by NET-24-0-0-0-0.<br><a href="http://24.0.0.0/16" target="_blank">24.0.0.0/16</a> for Customer Org: C01508994. It has a netHandle NET-24-0-0-0-2<br>
and is parented by NET-24-0-0-0-1.<br><br>
NB: the very top of the hierarchy (the blocks issued by IANA to ARIN for<br>
issuance to our customers, like /8s in IPv4 and /12s and /23s in IPv6)<br>
won't have parentNetHandles, so you may also want to account for that.<br><br>
2) Associating POCs with NETs:<br><br>
Writing about NETs, you've indicated: "I'd like to get the list of<br>
associated POCs". I think there are two concepts that you need to know to<br>
accomplish this goal.<br><br>
- NETs can have POCs. We call them "resource POCs". &nbsp;A NET can have three<br>
kinds of resources POCs: a TechPOC, an AbusePOC, and/or a NOC POC. But all<br>
these are optional. &nbsp;In most cases, a NET will not have any resource POC.<br>
So you should see a lot of blank POC fields.<br><br>
- NETs can be registered to either a customer org handle (which look like<br>
C01508994), or a "real" OrgID (like OrgID: CMCS for Comcast or OrgID:<br>
CITYO-121 for the City of Rockford). The real OrgIDs all have one admin<br>
POC, and at least one tech POC and abuse POC. &nbsp;They may also have at least<br>
one optional NOC POC. &nbsp;In contrast, customer orgs do not have POCs at all.<br><br>
In the easy case, a NET is registered to an OrgID, and we consider the<br>
OrgID's POCs to be the POCs for their NETs.<br><br>
For example, <a href="http://216.64.0.0/17" target="_blank">216.64.0.0/17</a> is registered to OrgID: PAET. &nbsp;The /17 has no<br>
resource POCs. &nbsp;OrgID: PAET has an admin POC, a tech POC, and an abuse<br>
POC. &nbsp;By reference, therefore, you can say that <a href="http://216.64.0.0/17" target="_blank">216.64.0.0/17</a> has the<br>
following associated POCs:<br>
- IP43-ARIN (the admin and tech POC for OrgID: PAET); and<br>
- ABUSE741-ARIN (the abuse POC for OrgID: PAET).<br><br>
In the harder case, a NET is registered to a customer Org record. &nbsp;To<br>
associate any POCs with it, you have to go up a level of hierarchy to the<br>
parentNetHandle, identify the OrgID the parentNetHandle is registered to,<br>
and identify its POCs.<br><br>
For example, <a href="http://24.0.0.0/16" target="_blank">24.0.0.0/16</a> is registered to customer org record C01508994.<br>
To compile a list of POCs associated with <a href="http://24.0.0.0/16" target="_blank">24.0.0.0/16</a>, you'd go to the<br>
parentNetHandle, NET-24-0-0-0-1, cross over to the OrgID: CMCS, and find<br>
its POCs (NAPO-ARIN and IC161-ARIN).<br><br>
So that's how ARIN views it, anyway. It's how we construct Whois query<br>
results, in hopes of providing the most useful data possible.<br><br>
I hope all that helps!<br><br>
Regards,<br>
David<br><br>
---<br>
David R Huberman<br>
Principal Technical Analyst, ARIN<br><a href="tel:703-227-9866" value="+17032279866">703-227-9866</a><br><div><div class="h5">
<br><br><br><br><br><br><br>
On 3/15/12 11:05 AM, "Luke Miner" &lt;<a href="mailto:lminer <at> gmail.com">lminer@...</a>&gt; wrote:<br><br>
I am trying to convert the bulk whois data into csv format so that I can<br>
use it in a statistical package.<br>
For each IP address I'd like to get the list of associated POCs plus the<br>
date of assignment. Something like:<br>
-startAddress, endAddress, orgHandle1, registrationDate1, orgHandle2,<br>
registrationDate2, etc.<br><br><br>
Right now I'm importing the NET XML Elements data into Microsoft Access,<br>
and exporting the net table as a csv file. Unfortunately, there are errors<br>
during the import, so, for example, the POC field is empty. Also the CSV<br>
file that I'm left with is disaggregated at the network level, with<br>
overlapping ranges. It's not clear how to get from this to data providing<br>
all the networks associated with a given block of IP addresses.<br><br>
Any help would be most appreciated,<br><br>
Thanks,<br>
Luke<br>
</div></div>_______________________________________________<br>
arin-tech-discuss mailing list<br><a href="mailto:arin-tech-discuss@...">arin-tech-discuss@...</a><br><a href="http://lists.arin.net/mailman/listinfo/arin-tech-discuss" target="_blank">http://lists.arin.net/mailman/listinfo/arin-tech-discuss</a><br><br>
</blockquote>
</div>
<br>
</div>
</div>
David Huberman | 15 Mar 2012 17:17

Re: Converting XML to .csv format

Hello Luke,

Glad we could help! You asked:

> So just to be clear with the example below, ARIN is the
> top level network just in the sense that it is in charge
> of administering this IP block. A subset of these IPs
> have been assigned to Comcast, and within comcast's
> network, a subset is assigned to Comcast Customer Org?

Exactly.  And in other examples, it will be ARIN -> OrgID1 -> OrgID2 where
OrgID1 is ARIN's customer (Comcast) and OrgID2 is Comcast's customer
(David & Luke Consulting Co.).  ISPs who sub-delegate IP address blocks to
their customers in ARIN's Whois will sometimes use customer org records,
and sometimes use OrgIDs for their customers.

You wrote:

> If whois says that 24.0.0.0/8 has been allocated to ARIN
> in 2001, but Comcast doesn't register 24.0.0.0/12 until
> 2003, does this mean that 24.0.0.0/12 probably wasn't in
> use at least until the date it was assigned to Comcast?

It depends. 

Sometimes, ARIN assigns our customer an IP address block that has never
previously been assigned. Right now, for example, we're issuing some
blocks out of 100.0.0.0/8.  That's a new /8, and the registrants of space
in that /8 are the very first registrants.

But ARIN also recycles space.  Folks return space to us when they're done
with it, and we re-issue it to a new customer.  Folks also go out of
business, and we revoke the space when they don't pay their bills, and we
re-issue the space to a new customer.  Sometimes we reclaim space due to
fraud being committed. That space also gets re-issed to new customers.

In the example, Comcast was not the original registrant of 24.0.0.0/12.
The beginning of the block was first issued in December 1995 to  <at> Home
Network, Milo Medin's cable internet company from way back in the day.
The bulk Whois data does not provide this data, however.  Bulk Whois is a
snapshot of how Whois looks on the day the data is downloaded.

Regards,
David

---
David R Huberman
Principal Technical Analyst, ARIN
703-227-9866


Gmane